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MATERIALS FOR THE PRODUCTION OF 
NANOMETER STRUCTURES AND USE THEREOF 

rififcP QT THE agfflZIfiH 

5 The present invention pertains to nanostructur es , 

i.e., nanometer sized structures useful in the construction 
of microscopic and macroscopic structures. In particular, 
the present invention pertains to nanostructures based on 
bacteriophage T4 tail fiber proteins and variants thereof. 

10 

BACKGROUND TO THE INVENTION 

While the strength of most metallic and ceramic 
based materials derives from the theoretical bonding 
strengths between their component molecules and crystallite 

15 surfaces, it is significantly limited by flaws in their 
crystal or glass-like structures. These flaws are usually 
inherent in the raw materials themselves or developed during 
fabrication and are often expanded due to exposure to 
environmental stresses. 

2q The emerging field of nanotechnology has made the 

limitations of traditional materials more critical. The 
ability to design and produce very small structures (i.e., of 
nanometer dimensions) that can serve complex functions 
depends upon the use of appropriate materials that can be 

25 manipulated in predictable and reproducible ways, and that 
have the properties required for each novel application. 

Biological systems serve as a paradigm for 
sophisticated nanostructures. Living cells fabricate proteins 
and combine them into structures that are perfectly formed 

30 and can resist damage in their normal environment. In some 
cases, intricate structures are created by a process of 
self-assembly, the instructions for which are built into the 
component polypeptides. Finally, proteins are subject to 
proofreading processes that insure a high degree of quality 

35 control. 

Therefore, there is a n ed in the art for methods 
and compositions that exploit these uniqu features of 
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proteins to form constituents of synthetic nanostructures . 
The need is to design materials whose properties can be 
tailored to suit the particular requirements of 
nanometer-scale technology. Moreover, since the subunits of 
5 most macrostructural materials, ceramics, metals, fibers, 
etc., are based on the bonding of nanostructural subunits, 
the fabrication of appropriate subunits without flaws and of 
exact dimensions and uniformity should improve the strength 
and consistency of the macrostructures because the surfaces 
10 are more regular and can interact more closely over an 
extended area than larger, more heterogeneous material. 

In one aspect, the present invention provides 

15 isolated protein building blocks for nanostructures, 

comprising modified tail fiber proteins of bacteriophage T4 . 
The gp34, 36, and 37 proteins are modified in various ways to 
form novel rod structures with different properties. 
Specific internal peptide sequences may be deleted without 

20 affecting their ability to form dimers and associate with 
their natural tail fiber partners. Alternatively, they may 
be modified so that they: interact only with other modified, 
and not native, tail fiber partners; exhibit thermolabile 
interactions with their partners; or contain additional 

25 functional groups that enable them to interact with 
heterologous binding moieties. 

The present invention also encompasses fusion 
proteins that contain sequences from two or more different 
tail fiber proteins. The gp35 protein, which forms an angle 

30 joint, is modified so as to form average angles different 
from the natural average angle of 137° (±7°) or 156° (±12°), 
and to exhibit thermolabile interactions with its partners. 

In another aspect, the present invention provides 
nanostructures comprising native and modified tail fiber 

35 proteins of bacteriophage T4 . The nanostructures may be one- 
dim nsional rods, two-dimensional polygons or open or closed 
she ts, or thre -dimensional open cages or closed solids. 

- 2 - 
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BRIEF PPSCRIPTXOK of TB3 drawings 

Figures 1A and IB show a schematic representation 
of the T4 bact riophage particl (Figure 1A) , and a schematic 
representation of the T4 bacteriophage tail fiber (Figure 
5 IB) . 

Figure 2 shows a schematic representation of a unit 

rod. 

Figures 3A-3D show schematic representations of: a 
one-dimensional multi-unit rod joined along the x axis 
10 (Figure 3 A) ; closed simple sheets (Figure 3B) ; closed 
brickwork sheets (Figure 3C) ; and open brickwork sheets 
(Figure 3D) . 

Figure 4 shows a schematic representation of two 
units used to construct porous and solid sheets (top and 
15 bottom), which, when alternatively layered, produce a multi- 
tiered set of cages as shown. 

Figure 5 shows a schematic representation of an 
angled structure having an angle of 120°. 

Figure 6 shows the DNA sequence (SEQ ID N0:1) of 
20 genes 34, 35, 36, and 37 of bacteriophage T4. 

Figure 7 shows the amino acid sequences (shown in 
single-letter codes) of the gene products of genes 34 
(SEQ ID N0:2, ORFX SEQ ID NO:3), 35 (SEQ ID N0:4) , 36 
(SEQ ID NO: 5), and 37 (SEQ ID NO: 6) of bacteriophage T4 . The 
25 amino acid sequences (bottom line of each pair) are aligned 
with the nucleotide sequences (top line of each pair.) It is 
noted that the deduced protein sequence of gene 3 5 (from NCBI 
database) is not believed to be accurate. 

Figures 8A-8B show a schematic representation of: 
30 the formation of a P37 dimer initiator from a molecule that 
self -assembles into a dimer (Figure 8A) ; and the formation of 
a P37 trimer initiator from a molecule that self-assembles 
into a trimer (Figure 8B) . 

Figure 9 shows a schematic representation of the 
35 formation of the polymer (P37-36)n with an initiator that is 
a s If -assembling dimer. 

- 3 - 
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DETAILED DESCRIPTION OP THE INVENT TOM 

All pat nts, patent applications and literature 
r ferences cited in the specification are hereby incorporated 
by reference in their entirety. In the case of 
5 inconsistencies, the present disclosure, including, 
definitions, will prevail. 

Although the invention is described in terms of 
bacteriophage T4 tail fiber proteins, it will be understood 
that the invention is also applicable to tail fiber proteins 
10 of other T-even-like phage, e.g., of the T4 family (e.g., T4, 
Tula, Tulb), and T2 family (T2, T6, K3, Ox2 , Ml, etc.) 

DEPINITIONS: 

"Nanostructures" are defined herein as structures 
15 of different sizes and shapes that are assembled from 
nanometer- sized protein components. 

"Chimers" are defined herein as chimeric proteins 
in which at least the amino- and carboxy-terminal regions are 
derived from different original polypeptides, whether the 
20 original polypeptides are naturally occurring or have been 
modified by mutagenesis. 

••Homodimers" are defined herein as assemblies of 
two substantially identical protein subunits that form a 
defined three-dimensional structure. 
25 The designation w gp w denotes a monomeric 

polypeptide, while the designation W P" denotes homooligomers. 
P34, P36, and P37 are presumably homodimers or homotrimers. 

An isolated polypeptide that "consists essentially 
of H a specified amino acid sequence is defined herein as a 
30 polypeptide having the specified sequence or a polypeptide 
that contains conservative substitutions within that 
sequence. Conservative substitutions, as those of ordinary 
skill in the art would understand, are ones in which an 
- acidic residue is replaced by an acidic residue, a basic 
35 residue by a basic residue, or a hydrophobic residue by a 
hydrophobic residue. Also encompassed is a polypeptide that 
lacks one r more amin acids at either the amino t rminus or 

- 4 - 
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carboxy t minus, up to a total of five at either terminus, 
vh n the absence of the particular residues has no 
disc rnable ff ct on the structure or th function of the 
polypeptide in practicing the present invention • 
5 The present invention pertains to a new class of 

protein building blocks whose dimensions are measured in 
nanometers, which are useful in the construction of 
microscopic and macroscopic structures. Without wishing to 
be bound by theory, it is believed that the basic unit is a 

10 homodimer composed of two identical protein subunits having a 
cross-0 configuration, although a trimeric structure is also 
possible. Thus, as will be apparent, references to a 
"homodimer" or "dimerization" as used herein will in many 
instances be construed as also referring to a homotrimer or 

IS trimerization. These long, stiff, and stable rod-shaped 
units can assemble with other rods using coupling devices 
that can be attached genetically or in vitro. The ends of 
one rod may attach to different ends of other rods or similar 
rods. Variations in the length of the rods, in the angles of 

20 attachment, and in their flexibility characteristics permit 
differently-shaped structures to self-assemble in situ. In 
this manner the units can self -assemble into predetermined 
larger structures of one, two or three dimensions. The 
self-assembly can be staged to form structures of precise 

25 dimensions and uniform strength due to the flawless 

biological manufacture of the components. The rods can also 
be modified by genetic and chemical modifications to form 
predetermined specific attachment sites for other chemical 
entities, allowing the formation of complex structures. 

30 An important aspect of the present invention is 

that the protein units can be designed so that they comprise 
rods of different lengths, and can be further modified to 
include features that alter their surface properties in 
predetermined ways and/or influence their ability to join 

35 with other identical or diff rent units. Furthermore, the 
s lf-assembly capabilities can be expanded by producing 
chimeric prot ins that combine the properties of two 

- 5 - 
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different members of this class. This design feature is 
achi ved by manipulating the structur of the genes encoding 
these prot ins. 

As detailed below, the compositions and methods of 
5 the present invention take advantage of the properties of the 
natural proteins, i.e., the resulting structures are stiff, 
strong, stable in aqueous media, heat resistant, protease 
resistant, and can be rendered biodegradable. A large 
quantity of units can be fabricated easily in microorganisms. 

XO Furthermore, for ease of automation, large quantities of 
parts and subassemblies can be stored and used as needed. 

The sequences of the protein subunits are based on 
the components of the tail fiber of the T4 bacteriophage of 
E. coli. It will be understood that the principles and 

15 techniques can be applied to the tail fibers of other T-even 
phages, or other related bacteriophages that have similar 
tail and /or fiber structures. 

The structure of the T4 bacteriophage tail fiber 
(illustrated in Figure 1) can be represented schematically as 

20 follows (N= amino terminus, C- carboxy terminus): N[P34]C - 
N[gp35]C - N[P36]C - N[P37]C. P34, P36, and P37 are all 
stiff, rod-shaped protein homodimers in which two identical & 
sheets, oriented in the same direction, are fused 
face-to-face by hydrophobic interactions between the sheets 

25 juxtaposed with a 180° rotational axis of symmetry through 
the long axis of the rod. (The structure will vary if P34, 
P36, and P37 are homotr imers . ) gp35, by contrast, is a 
monomer ic polypeptide that attaches specifically to the 
N-terminus of P36 and then to the C-terminus of P34 and forms 

30 an angle joint between two rods. During T4 infection of £. 
coli, two gp37 monomers dimerize to form a P37 homodimer; the 
process of dimerization is believed to initiate near the 
C-terminus of P37 and to require two E. coli chaperon 
proteins. (A variant gp37 with a temperature sensitive 

35 mutation near the C-terminus us d in the present invention 
requires only on chap ron, gp57, for dimerization.) Once 
dimerized, the N-terminus of P37 initiates the dimerization 

- 6 - 
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of two gp36 monomers to a P36 rod. The joint between the 
C-terminus of P36 and th N-terminus of P37 is tight and 
stiff but noncovalent. The N-t rminus of P3 6 then attaches 
to a gp35 monomer; this interaction stabilizes P36 and forms 
5 the elbow of the tail fiber. Finally, gp35 attaches to the 
C-terminus of P34 (which uses gp57 for dimerization) . Thus, 
self assembly of the tail fiber is regulated by a 
predetermined order of interaction of specific subunits 
whereby structural maturation caused by formation of the 

10 first subassembly permits interaction with new (previously 
disallowed) subunits. This results in the production of a 
structure of exact specifications from a random mixture of 
the components. 

In accordance with the present invention, the genes 

15 encoding these proteins may be modified so as to make rods of 
different lengths with different combinations of ends. The 
properties of the native proteins are particularly 
advantageous in this regard. First, the /J-sheet is composed 
of antiparallel /3-strands with 0-bends at the left (L) and 

2 0 right (R) edges. Second, the amino acid side chains 

alternate up and down out of the plane of the sheet. The 
first property allows bends to be extended to form symmetric 
and specific attachment sites between the L and R surfaces, 
as well as to form attachment sites for other structures. In 

25 addition, the core sections of the /3-sheet can be shortened 
or lengthened by genetic manipulations e.g., by splicing DNA 
regions encoding /8-bends, on the same edge of the sheet, to 
form new bends that exclude intervening peptides, or by 
inserting segments of peptide in an analogous manner by 

30 splicing at bend angles. The second property allows amino 
acid side chains extending above and below the surface of the 
/8-sheet to be modified by genetic substitution or chemical 
coupling. Importantly, all of the above modifications are 
achieved without compromising the structural integrity of the 

35 rod. It will be understood by one skilled in the art that 
thes properties allow a great deal of flexibility in 
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designing units that can assemble into a broad variety of 
structur s, some of which are detailed below. 

mpcrvKM raiTg 

5 The rods of the present invention function like 

wooden 2X4 studs or steel beams for construction. In this 
case, the surfaces are exactly reproducible at the molecular 
level and thereby fitted for specific attachments to similar 
or different units rods at fixed joining sites. The surfaces 

XO are also modified to be more or less hydrophilic, including 
positively or negatively charged groups, and have protrusions 
built in for specific binding to other units or to an 
intermediate joint with two receptor sites. The surfaces of 
the rod and a schematic of the unit rod are illustrated in 

15 Figure 2. The three dimensions of the rod are defined as: x, 
for the back (B) to front (F) dimension; y, for the down (D) 
to up (U) dimension; and z, for the left (L) to right (R) 
dimension. 

One dimensional multi-unit rods can be most readily 

2 0 assembled from single unit rods joined along the x axis 

(Figure 3A) but regular joining of subunits in either of th 
other two dimensions will also form a long structure, but 
with different cross sections than in the x dimension. 

Two dimensional constructs are sheets formed by 

25 interaction of rods along any two axes. 1) Closed simple 
sheets are formed from surfaces which overlap exactly, along 
any two axes (Figure 3B) . 2) ClPggfl bricfrWPrK Sheets are 
formed from interaction between units that have exactly 
overlapping surfaces in one dimension and a special type of 

30 overlap in the other (Figure 3C) . In this case there must be 
two different sets of complementary joints spaced with 
exactly 1/2 unit distance between them. If they are center d 
(i.e., each set 1/4 from the end) then each joint will be in 
the center of the units above and below • If they are offset, 

35 then the joint will be offset as well. In this construction, 
th complementary interacting sites are schematized by * and 
## . If the interacting sites are each symmetric, the 

- 8 - 
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alternating rows can interact with the rods in either 
direction. If th y are not symmetric, and can only interact 
with int racting rows facing in the same or opposite 
direction, the sheet will made of unidirectional rods or 
5 layers of rods in alternating directions. 3) Open brickwork 
sheets (or nets) result when the units are separated by more 
than one-half unit (Figure 3D) . The dimensions of the 
openings (or pores) depend upon the distance (dx) separating 
the interacting sites and the distance (dy) by which these 

10 sites separate the surfaces. 

Three dimensional constructs require sterically 
compatible interactions between all three surfaces to form 
solids. 1) Closed solids can assemble from units that 
overlap exactly in all three dimensions (e.g., the exact 

IS overlapping of closed simple sheets) . In an analogous 
manner, closed brickwork sheets can form closed solids by 
overlapping sheets exactly or displaced to bring the 
brickwork into the third dimension. This requires an 
appropriate set of joints on all three pairs of parallel 

20 faces of the unit. 2) Porous solids are made by joining 
open brickwork sheets in various ways. For example, if the 
units overlap exactly in the third dimension, a solid is 
formed with the array of holes of exact dimensions running 
perpendicular to the plane of the paper. If instead, a 

25 material is needed with closed spaces, with layers of width 
dz (i.e., in the U-»D dimension), a simple closed sheet is 
layered on the open brickwork sheet to close the openings. 
If the overlap of the open brickwork sheet is e.g., 1/4 unit, 
then a rod of length 3/4 units is used to make the sheet. 

30 Joints are then needed in the z dimension. The two units 
used to polymerize these alternate layers, and the layers 
themselves, are schematized in Figure 4. 

All of the above structures are composed of simple 
linear rods. A second unit, the angle unit, expands the type 

35 and dimensionality of possible structures. The angle unit 
connects two rods at angles diff rent from 180°, akin to an 
angle iron. The averag angle and its d gree of rigidity are 
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built into this connector structure. For example, the 
structure shown in Figure 5 has an angle of 120° and 
different specific joining sites at a and at b. The 
following are examples of structures that are formed 
5 utilizing angle joints: 

1) Qpen frricKVPrK Bhggfcfi are expanded and 
strengthened in the direction normal to the rod direction by 
adding angles perpendicular to the sheet. In this case, a 
three dimensional network forms. Attachment of 90° angles to 

10 the ends of the rods makes an angle almost in the plane of 
the sheet, allowing new rods added to those angles (which 
must have some play out of the plane of the original sheet to 
attach in the first place) to form a new sheet, almost 
parallel, with an orientation normal to its upper or lower 

15 neighbor. 

2) Hexagons are made from a mixture of rods and 
angle joints that form 120° angles. In this case, there are 
two exclusive sets of joints. Each set is made up of one of 
the two ends of the rod and one of the two complementary 

20 sites on the angle. This is a linear structure in the sense 
that the hexagon has a direction (either clockwise or 
counterclockwise) . It can be made into a two dimensional 
open net (i.e., a two dimensional honeycomb) by joining the 
sides of the hexagons. It can form hexagonal tubes by 

25 joining the top of the hexagon below to the bottom face of 
the hexagon above. If the tubes also join by their sides, 
they will form an open three dimensional multiple hexagonal 
tube. 

3) Helical hexagonal tubes are made analogously to 
30 hexagons but the sixth unit is not joined to the first to 

close the hexagon. Instead, the end is displaced from the 
plane of the hexagon and the seventh and further units are 
added to form a hexagonal tube which can be a spring if there 
is little or no adhesive force between the units of the 
35 helix, or a stiff rod if ther is such a force to maintain 
th close pr ximity of apposing units. 

- 10 - 
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It will be apparent to one skilled in the art that 
th compositions and m thods of the pr sent invention also 
encompass other polygonal structur s such as octagons, as 
veil as open solids such as tetrahedrons and icosahedrons 
S formed from triangles and boxes formed from squares and 
rectangles. The range of structures is limited only by the 
types of angle units and the substituents that can be 
engineered on the different axes of the rod units. For 
example, other naturally occurring angles are found in the 
10 fibers of bacteriophage T7, which has a 90 p angle (Steven et 
al., J. Mol. Biol. 200: 352-365, 1988). 

PBPIgP *HP PSQPUCTJON pr TBS R OD pRPT g lf ffl 

The protein subunits that are used to construct the 

15 nanostructures of the present invention are based on the four 
polypeptides that comprise the tail fibers of bacteriophage 
T4, i.e., gp34, gp35, gp36 and gp37. The genes encoding 
these proteins have been cloned, and their DNA and protein 
sequences have been determined (for gene 36 and 37 see Oliver 

20 et al. J. Mol. Biol. 153: 545-568, 1981). The DNA and amino 
acid sequences of genes 34, 35 , 36 and 37 are set forth in 
Figures 6 and 7 below. 

Gp34, gp35, gp36, and gp37 are produced naturally 
following infection of E. coli cells by intact T4 phage 

25 particles. Following synthesis in the cytoplasm of the 
bacterial cell, the gp34, 36, and 37 monomers form 
homodimers, which are competent for assembly into maturing 
phage particles. Thus, E. coli serves as an efficient and 
convenient factory for synthesis and dimerization of the 

30 protein subunits described herein below. 

In practicing the present invention, the genes 
encoding the proteins of interest (native, modified, or 
recombined) are incorporated into DNA expression vectors that 
are well known in the art. These circular plasmids typically 

35 contain sel ctable marker g nes (usually conferring 

antibiotic resistance to transformed bacteria) , sequenc s 
that allow replication of th plasmid to high copy number in 

- 11 - 



BNSDOCID: <WO 9611947A1J_> 



WO 96/11947 



PCT/US95/13023 



E. coli, and a multiple cloning site immediately downstr am 
of an inducible promoter and ribosome binding site. Examples 
of commercially available v ctors suitable for use in the 
present invention include the pET system (Novagen, Inc., 
5 Madison, WI) and Superlinker vectors pSE280 and pSE380 
(Invitrogen, San Diego, CA) . 

The strategy is to 1) construct the gene of 
interest and clone it into the multiple cloning site; 2) 
transform E. coli cells with the recombinant plasmid; 3) 

10 induce the expression of the cloned gene; 4) test for 

synthesis of the protein product; and, finally, 5) test for 
the formation of functional homodimers. In some cases, 
additional genes are also cloned into the same plasmid, when 
their function is required for dimerization of the protein of 

15 interest. For example, when wild-type or modified versions 
of gp37 are expressed, the bacterial chaperon gene 57 is also 
included; when wild-type or modified gp36 is expressed, the 
wild-type version or a modified version of the gp37 gene is 
included. The modified gp37 should have the capacity to 

20 dimerize and contain an N-terminus that can chaperon the 
dimerization of gp36. This method allows the formation of 
monomer ic gene products and, in some cases, maturation of 
monomers to homodimeric rods in the absence of other 
phage- induced proteins normally present in a T4-infected 

25 cell. 

Steps 1-4 of the above-defined strategy are 
achieved by methods that are well known in the art of 
recombinant DNA technology and protein expression in 
bacteria. For example, in step l, restriction enzyme 

30 cleavage at multiple sites, followed by ligation of 

fragments, is used to construct deletions in the internal rod 
segment of gp34, 36, and 37 (see Example 1 below). 
Alternatively, a single or multiple restriction enzyme 
cleavage, followed by exonuclease digestion (EXO-SIZE, New 

35 England Biolabs, Beverly, MA), is used t delete DNA 

s guences in one or both dir ctions from th initial cleavage 
site; wh n combined with a subs guent ligation step, this 
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procedure produces a nested set of deletions of increasing 
siz s. Similarly, standard m thods are used to recombine DNA 
s gments from two different tail fiber genes, to produce 
chimeric genes encoding fusion proteins (called "chimers" in 
5 this description). In general, this last method is used to 
provide alternate N- or C-termini and thus create novel 
combinations of ends that enable new patterns of joining of 
different rod segments. A representative of this type of 
chimer, the fusion of gp37-36, is described in Example 2. 

10 The preferred hosts for production of these proteins (Step 2) 
is E. coli strain BL21(DE3) and BL21 (DE3/pLysS) (available 
commercially from Novagen, Madison, WI) , although other 
compatible recA strains, such as HMS174(DE3) and 
HMS174 (DE3/pLysS) can be used. Transformation with the 

15 recombinant plasmid (Step 2) is accomplished by standard 

methods (Sambrook, J., Molecular Cloning, Cold Spring Harbor 
Laboratories, Cold Spring Harbor, NY; this is also the source 
for standard recombinant DNA methods used in this invention.) 
Transformed bacteria are selected by virtue of their 

20 resistance to antibiotics e.g., ampicillin or kanamycin. The 
method by which expression of the cloned tail fiber genes is 
induced (Step 3) depends upon the particular promoter used. 
A preferred promoter is plac (with a laci q on the vector to 
reduce background expression) , which can be regulated by the 

25 addition of isopropylthiogalactoside (IPTG) . A second 
preferred promoter is pT701O, which is specific to T7 RNA 
polymerase and is not recognized by E. coli RNA polymerase. 
T7 RNA polymerase, which is resistant to rifamycin, is 
encoded on the defective lambda DE lysogen in the E . coli 

30 BL21 chromosome. T7 polymerase in BL21(DE3) is 

super-repressed by the laci* gene in the plasmid and is 
induced and regulated by IPTG. 

Typically, a culture of transformed bacteria is 
incubated with the inducer for a period of hours, during 

35 which the synthesis of the prot in of interest is monitored. 
In the present instance, extracts of the bacterial cells are 
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prepared, and the T4 tail fiber proteins are detected, for 
example, by SDS-polyacrylamide gel electr phoresis. 

Once the modified protein is detected in bacterial 
extracts, it is necessary to ascertain whether or not it 
5 forms appropriate homodimers (Step 4). This is accomplished 
initially by testing whether the protein is recognized by an 
antiserum specific to the mature dimerized form of the 
protein. 

Tail fiber-specific antisera are prepared as 

10 described (Edgar, R.S. and Lielausis, I., Genetics 52: 1187, 
1965; Ward et al, J . Hoi. Biol. 54:15, 1970). Briefly, whole 
T4 phage are used as an immunogen; optionally, the resulting 
antiserum is then adsorbed with tail-less phage particles, 
thus removing all antibodies except those directed against 

15 the tail fiber proteins. In a subsequent step, different 
aliquots of the antiserum are adsorbed individually with 
extracts that each lack a particular tail fiber protein. For 
example, if an extract containing only tail fiber components 
P34, gp35, and gp36 (derived from a cell infected with a 

20 mutant T4 lacking a functional gp37 gene) is used for 
absorption, the resulting antiserum will recognize only 
mature P37 and dimerized P36-P37. A similar approach may be 
used to prepare individual antisera that recognize only 
mature (i.e., homodimerized) P34 and P36 by adsorbing with 

25 extracts containing distal half tail fibers or P34, gp35 and 
P37, respectively. An alternative is to raise antibody 
against purified tail fiber halves, e.g., P34 and 
gp35-P36-P37. Anti gp35-P36-P37 can then be adsorbed with 
P36-P37 to produce anti-gp35, and anti-P36 can be produced by 

30 adsorption with P37 and gp35. Anti-P37, anti-gp35, and anti- 
P34 can also be produced directly by using purified P37, 
gp35, and P34 as immunogens. Another approach is to raise 
specific monoclonal antibodies against the different tail 
fiber components or segments thereof. 

35 Specific antibodies to subunits or tail parts are 

used in any of th following ways to detect appropriately 
homodimerized tail fiber proteins: l) Bacterial colonies are 
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scr en d for those expressing mature tail fiber proteins by 
directly transferring the colonies, or, alternatively, 
samples of lys d or unlys d cultures, to nitrocellulos 
filters, lysing the bacterial cells on the filter if 
5 necessary, and incubating with specific antibodies. 

Formation of immune complexes is then detected by methods 
widely used in the art (e.g., secondary antibody conjugated 
to a chromogenic enzyme or radiolabelled Staphylococcal 
Protein A.). This method is particularly useful to screen 

10 large numbers of colonies e.g., those produced by EXO-SIZE 
deletion as described above. 2) Bacterial cells expressing 
the protein of interest are first metabolically labelled with 
35 S-methionine, followed by preparation of extracts and 
incubation with the antiserum. The immune complexes are then 

15 recovered by incubation with immobilized Protein A followed 
by centrifugation, after which they may be resolved by 
SDS-polyacrylamide gel electrophoresis. 

An alternative competitive assay for testing 
whether internally deleted tail fiber proteins that do not 

20 permit phage infection nonetheless retain the ability to 
dimerize and associate with their appropriate partners 
utilizes an in vitro, complementation system. 1) A bacterial 
extract containing the modified protein of interest, as 
described above, is mixed with a second extract prepared from 

25 cells infected with a T4 phage that is mutant in the gene of 
interest. 2) After several hours of incubation, a third 
extract is added that contains the wild-type version of the 
protein being tested, and incubation is continued for several 
additional hours. 3) Finally, the extract is titered for 

30 infectious phage particles by infecting E. coli and 

quantifying the phage plaques that result. A modified tail 
fiber protein that is correctly dimerized and able to join 
with its partners is incorporated into tail fibers in a 
non-functional manner in Step 1, thereby preventing the 

35 incorporation of the wild-typ version of the protein in Step 
2; the result is a reduction in the titer of the resulting 
phage sampl . By contrast, if th modified protein is unable 
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to dimerize and thus form prop r N- and/or C-termini, it will 
not be incorporated into phage particl s in Step 1, and thus 
will not compet with assembly of intact phage particl s in 
Step 2; the phage titer should thus be equivalent to that 
5 observed when no modified protein is added in Step 1 (a 
negative control.) 

Another way in which to test whether chimers and 
internally deleted tail fiber proteins retain the ability to 
dimerize and associate with their appropriate partners is 

10 done in vivo. The assay detects the ability of such chimers 
and deleted proteins to compete with normal phage parts for 
assembly, thus reducing the burst size of a wild-type phage 
infecting the same host cell in which the chimers or deleted 
proteins are recombinant ly expressed. Thus, expression from 

15 an expression vector encoding the chimer or deleted protein 
is induced inside a cell, which cell is then infected by a 
wild-type phage. Inhibition of wild-type phage production 
demonstrates the ability of the recombinant chimer or protein 
to associate with the appropriate tail fiber proteins of the 

20 phage. 

The above-described methods are used, alone and in 
combination, in the design and production of different types 
of modified tail fiber proteins. For example, a preliminary 
screen of a large number of bacterial colonies for those 

25 expressing a properly dimerized protein will identify 

positive colonies, which can then be individually tested by 
in vitro complementation. 

Non-limiting examples of novel proteins that are 
encompassed by the present invention include: 

30 1) Internally deleted gp34, 36 , and 37 

polypeptides (See Example 1 below) ; 

2) A C-terminally truncated gp36 fused to the N- 
terminus of N-terminally truncated gp37; 

3) A fusion between gp36 and gp37 in which gp37 is 
35 N-terminal to gp36 (i.e., in r verse of th natural order), 

termed her in w gp37-3 6 chimer" (S e Example 2 below) ; 
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4) A fusion between gp34 and gp36 in which gp36 is 
N-t rminal to gp34 (i. . , in reverse of the natural order), 
term d h rein H gp36-34 chimer 11 ; 

5) A variant of gp36 in which the C-terminus is 
S nutated such that it lacks the capability to interact with 

(and dimerize in response to) the N-terminus of wild-type 
P37, termed herein w gp36*"; 

6) A variant of gp37 in which the N-terminus is 
mutated such that it forms a P37 that lacks the capability to 

10 interact with the C-terminus of wild-type gp36, termed herein 
«*P37"; 

7) Variants of gp36* and *P37 that can interact 
with each other , but not with gp36 or P37. 

8) A variant "P37-36 chimer" in which the gp36 

15 moiety is derived from the variant as in 5), i.e., W P37-36* M . 
(For 5-8 , See Example 3 below.) 

9) A variant "P37-36 chimer" in which the gp37 
moiety is derived from the variant as in 6) above, i.e., 
"*P37-36». 

20 10) A variant P37-36 chimer, *P37-P36*, in which 

the gp36 and gp37 moieties are derived from the variants in 
7) . 

11) A fusion between gp36 and gp34 in which gp36 
sequences are placed N-terminal to gp34, the dimer of which 

25 is termed herein "P36-34 chimer 11 ; 

12) Variants of gp35 that form average angles 
different from 137° or 158° (the native angle) e.g., less 
than about 125° or more than about 145° under conditions 
wherein the wild-type gp35 protein forms an angle of 137° 

30 when combined with the P34 and P36-P37 dimers, and/or exhibit 
more or less flexibility than the native polypeptide; 

13) Variants of gp34, 35, 36 and 37 that exhibit 
thermolabile interactions or other variant specific 
interactions with their cognate partners; and 

35 14) Variants of gp37 in which the C-terminal 

domain of the polypeptide is modified to include sequences 
that confer specific binding properties on the entire 
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molecule, e.g., sequ nces derived from avidin that recognize 
biotin, sequences derived from immunoglobulin h avy chain 
that r cognize Staphylococcal A protein, s qu nces derived 
from the Fab portion of the heavy chain of monoclonal 
5 antibodies to which their respective Fab light chain 

counterparts could attach and form an antigen-binding site, 
immunoactive sequences that recognize specific antibodies, or 
sequences that bind specific metal ions. These ligands may 
be immobilized to facilitate purification and/or assembly. 

10 In specific embodiments, the chimers of the 

invention comprise a portion consisting of at least the first 
10 (N-terminal) amino acids of a first tail fiber protein 
fused via a peptide bond to a portion consisting of at least 
the last 10 (C-terminal) amino acids of a second tail fiber 

15 protein. The first and second tail fiber proteins can be th 
same or different proteins. In another embodiment, the 
chimers comprise an amino acid portion in the range of the 
first 10-60 amino acids from a tail fiber protein fused to an 
amino acid portion in the range of the last 10-60 amino acids 

20 from a second tail fiber protein. In another embodiment, 
each amino acid portion is at least 20 amino acids of the 
tail fiber protein. The chimers comprise portions, i.e., not 
full-length tail fiber proteins, fused to one another. In a 
preferred aspect, the first tail fiber protein portion of the 

25 chimer is from gp37, and the second tail fiber protein 

portion is from gp36. Such a chimer (gp37-36 chimer), after 
oligomerization to form P37-36, can polymerize to other 
identical oligomers. A gp36-34 chimer, after oligomerization 
to form P3 6-34, can bind to gp35, and this unit can then 

30 polymerize. In another embodiment, the first portion is from 
gp37, and the second portion is from gp34. In a preferred 
aspect, the chimers of the invention are made by insertions 
or deletions within a 0 turn of the 0 structure of the tail 
fiber proteins. Most preferably, insertions into a tail 

35 fiber sequence, or fusing to another tail fiber protein 

s quence, (preferably via manipulation at the recombinant DNA 
level to produc th d sired encoded protein) is don so that 
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sequenc s in ft turns on the same edge of the 0-sheet are 
join d. 

In addition to th above-described chimers, 
nanostructures of the invention can also comprise tail fiber 
5 protein deletion constructs that are truncated at one end, 
e.gr., are lacking an amino- or carboxy- end (of at least 5 or 
10 amino acids) of the molecule. Such molecules truncated at 
the amino-terminus, e.g., of truncated gp37, gp34, or gp36, 
can be used to "cap" a nanostructur e , since, once 

10 incorporated, they will terminate polymerization. Such 
molecules preferably comprise a fragment of a tail fiber 
protein lacking at least the first 10, 20, or 60 amino 
terminal amino acids. 

In order to change the length of the rod component 

15 proteins as desired, portions of the same or different tail 
fiber proteins can be inserted into a tail fiber chimer to 
lengthen the rod, or be deleted from a chimer, to shorten the 
rod. 

20 ASSEMBLY OF INDIVI DUAL ROD COMPONENTS INTO NANOSTRUCTURES 

Expression of the proteins of the present invention 
in E. coli as described above results in the synthesis of 
large quantities of protein, and allows the simultaneous 
expression and assembly of different components in the same 

25 cells. The methods for scale-up of recombinant protein 

production are straightforward and widely known in the art, 
and many standard protocols can be used to recover native and 
modified tail fiber proteins from a bacterial culture. 

In a preferred embodiment, native (nonrecombinant) 

30 gp35 is isolated for use by growing up a bacteriophage T4 
having an amber mutation in gene 36, in a su° bacterial 
strain (not an amber suppressor) , and isolating gp35 from the 
resulting culture by standard methods. 

P34, P36-P37, P37, and chimers derived from them 

35 ar purified from E. coli cultures as mature dimers. Gp35 and 
variants ther of ar purified as monom rs. Purification is 
achi v d by th following procedures or combinations thereof, 
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using standard m thods: 1) chromatography on molecular 
si ve, ion-exchange, and/or hydrophobic matric s; 
2) pr parativ ultracentrifugation; and 3) affinity 
chromatography, using as the immobilized ligand specific 
5 antibodies or other specific binding moieties. For example, 
the C-terminal domain of P37 binds to the lipopolysaccharide 
of E. coli B. Other T4-like phages have P37 analogues that 
bind other cell surface components such as OmpF or TSX 
protein. Alternatively, if the proteins have been engineered 

10 to include heterologous domains that act as ligands or 

binding sites, the cognate partner is immobilized on a solid 
matrix and used in affinity purification. For example, such 
a heterologous domain can be biotin, which binds to a 
streptavidin-coated solid phase. 

15 Alternatively, several components are co-expressed 

in the same bacterial cells, and sub-assemblies of larger 
nanostructures are purified subsequent to limited in vivo 
assembly, using the methods enumerated above. 

The purified components are then combined in vitro 

20 under conditions where assembly of the desired nanostructure 
occurs at temperatures between about 4°C and about 37 °C, and 
at pHs between about 5 and about 9 . For a given 
nanostructure, optimal conditions for assembly (i.e., type 
and concentration of salts and metal ions) are easily 

25 determined by routine experimentation, such as by changing 
each variable individually and monitoring formation of the 
appropriate products. 

Alternatively, one or more crude bacterial extracts 
may be prepared, mixed, and assembly reactions allowed to 

30 proceed prior to purification. 

In some cases, one or more purified components 
assemble spontaneously into the desired structure, without 
the necessity for initiators. In other cases, an initiator 
is required to nucleate the polymerization of rods or sheets. 

35 This offers the advantage of localizing the assembly process 
(i. ., if th initiator is immobilized or otherwise 
1 calized) and of r gulating the dimensions of the final 
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structur . For xample, rod components that contain a 
functional P36 C-terminus require a functional P37 N-terminus 
to initiat rod formation stoichiometrically ; thus, altering 
the relative amount of initiator and rod component will 
5 influence the average length of rod polymer. If the ratio is 
n, the average rod will be approximately 
(P37-36)n — N-terminus P37-P37 c-terminus. 

In still other cases, the final nanostructure is 
composed of two or more components that cannot self-assemble 

10 individually but only in combination with each other. In 
this situation, alternating cycles of assembly can be staged 
to produce final products of precisely defined structure (see 
Example 6B below.) 

When an immobilized initiator is used, it may be 

15 desirable to remove the polymerized unit from the matrix 
after staged assembly. For this purpose specialized 
initiators are engineered so that the interaction with the 
first rod component is rendered reversibly thermolabile (see 
Example 5 below) . In this way, the polymer can be easily 

20 separated from the matrix-bound initiator, thereby 

permitting: 1) easy preparation of stock solutions of uniform 
parts or subassemblies, and 2) re-use of the matrix-bound 
initiator for multiple cycles of polymer initiation, growth, 
and release. 

25 In an embodiment in which a nanostructure is 

assembled that is attached to a solid matrix via gp34 (or 
P34), one way in which to detach the nanostructure to bring 
it into solution is to use a mutant (thermolabile) gp34 that 
can be made to detach upon exposure to a higher temperature 

30 (e.g., 40»C) . Such a mutant gp34, termed T4 tsB45, having a 
mutation at its C-terminal end such that P34 attaches to the 
distal tail fiber half at 30 °C but can be separated from it 
in vitro by incubation at 40 °C in the presence of 1% SDS 
(unlike wild-type T4 Which are stable under these 

35 conditions), has been r ported (Seed, 1980, Studies of the 
Bacteriophage T4 Proximal Half Tail Fiber, Ph.D. Th sis, 
California Institute of Technology) , and can be used. 
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Proteins which catalyze the formation of correct 
(lowest energy) stable secondary (2°) structure of proteins 
are called chap rone prot ins. (Often, especially in 
globular proteins, this stabilization is aided by tertiary 
5 structure, e.g., stabilization of 0-sheets by their 

interaction in 0- barrels or by interaction with a-helices) . 
Normally chaperonins prevent intrachain or interchain 
interactions which would produce untoward metastable folding 
intermediates and prevent or delay proper folding. There are 

10 two known accessory proteins, gp57 and gp38, in the 

morphogenesis of T4 phage tail fibers which are sometimes 
called chaperonins because they are essential for proper 
maturation of the protein oligomers but are not present in 
the final structures. 

15 The usual chaperonin system (e.g., groEL/ES) 

interact with certain oligopeptide moieties of the gene 
product to prevent unwanted interactions with oligopeptide 
moieties elsewhere on the same polypeptide or another 
peptide. These would form metastable folding intermediates 

20 which retard or prevent proper folding of the polypeptide to 
its native (lower energy) state. 

Gp57, probably in conjunction with some membrane 
protein(s), has the role of juxtaposing (and aligning) and/or 
initiating the folding of 2 or 3 identical gp37 molecules. 

25 The aligned peptides then zip up (while mutually stabilizing 
their nascent ^-structures) to form a beam, without further 
interaction with gp57. Gp57 acts in T4 assembly not only for 
oligomerization of gp37 but also for gp34 and gpl2. 

30 gTBPCTPRMj CPKPOWBWTg TOE SELF hSSJBSSLX Of PPAWS IN VITRQ 
Alternatively to starting the polymerization of 
chimers with the use of a preformed chimeric or natural 
oligomer ic unit called an initiator produced in vivo, 
molecules (preferably peptides) that can self -assemble can be 

35 produced as fusion proteins, fused to the N- or C- terminus of 
tail fiber variants of the inv ntion (chim rs, 
del tion/ insertion constructs) to align their nds and thus 
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to facilitat th ir subsequent unaided folding into 
oligomeric, stable /3-fold d rod-like (beam) units in vitro, 
in th abs nee of the normally reguir d chaperonin proteins 
(e.g., gp57) and host cell membrane proteins. 
5 As an illustration, consider the P37 unit as an 

initiator of gp37-36 oligomer ization and polymerization. 
Normally, proper folding of gp37 to a P37 initiator requires 
a phage infected cell membrane, and two chaperone proteins, 
gp38 and gp57. In a preferred embodiment, the need for gp38 

10 can be obviated by use of a mutation, ts3813 (a duplication 
of 7 residues just downstream of .the transition zone of gp37) 
which suppresses gene 38 (Hood, W.B. , F.A. Eiserling and R.A. 
Crowther, 1994, "Long Tail Fibers: Genes, Proteins, 
Structure, and Assembly," in Molecular Biology of 

15 Bacteriophage T4 . (Jim D. Karam, Editor) American Society for 
Microbiology, Washington, D.C., pp 282-290). If a moiety 
that self-assembles into a dimer or trimer or other oligomer 
("self -assembling moiety") is fused to a C-terminal deletion 
of gp37 downstream or upstream of the transition region [the 

20 transition region is a conserved 17 amino acid residue region 
in T4-like tail fiber proteins where the structure of the 
protein narrows to a thin fiber; see Henning et al., 1994, 
"Receptor recognition by T-even-type coliphages," in 
Molecular Biology of Bacteriophage T4 . Karam (ed.), American 

25 Society for Microbiology, Washington, D.C., pp. 291-298; Wood 
et al., 1994, "Long tail fibers: Genes, proteins, structure, 
and assembly," in Molecular Biology of Bacteriophage T4 . 
Karam (ed.), American Society for Microbiology, Washington, 
D.C., pp. 282-290], when it is expressed, the self -assembling 

30 moiety will oligomerize in parallel and thus align the fused 
gp37 peptides, permitting them to fold in vitro, in the 
absence of other chaperonin proteins. 

If P37 is a dimer (Figure 8A) , the self -assembling 
moiety can be a self dimerizing peptide such as the leucine 

35 zipper, made from residues 250-281 from the yeast 

transcription factor, GCN4 (E.K. O'Shea, R.Rutkowski and P.S. 
Kim, Science 243:538, 1989) or the self dimerizing mutant 
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1 ucine zipper peptide, pIL in which the a positions are 
substitut d with isoleucine and the d positions with leucine 
(Harbury P.B., T. Zhang, P.S. Kim and T. Alper. 1993, A 
Switch Between Two-, Three-, and Four -Stranded Coiled Coils 
5 in GCN4 Leucine Zipper Mutants. Science, 262:1401-1407). If 
P37 is a trimer (Figure 8B) , the self-assembling moiety can 
be a self trimer izing mutant leucine zipper peptide, pll in 
which both the a and d positions are substituted with 
isoleucine (Harbury P.B-, et al. ibid). Alternatively, a 

10 collagen peptide can be used as the self -assembling moiety, 
such as that described by Bella et al. (J. Bella, M. Eaton, 
B. Brodsky and H.M. Berman. 1994. Crystal and Molecular 
Structure of a Collagen-Like Peptide at 1.9A Resolution. 
Science, 226:75-81), which self aligns by an inserted 

15 specific non repeating alanine residue near the center. 

Self-assembling moieties can be used to make 
initiators for polymerizations in the absence of the normal 
initiators. For example, to create an initiator for 
oligomer ization and polymerization of the chimeric monomer, 

20 gp37-36, gp37-36-C2 can be used as illustrated in Figure 9. 
(C2 means that a dimer forming peptide is fused to the 
C-terminus of the gp36 moiety. This is used if the beam is a 
dimer ic structure. Otherwise C 3 — a trimer forming peptide 
fused to the C-terminus — would be used.) Furthermore , use 

25 of the E. coli lac repressor N-terminus, e.g., which 
associates as a tetramer, with two coils facing in each 
direction could join two dimers (or polymers of dimers) end 
to end, either at their N- or C-termini depending upon which 
end the self -assembling peptides were placed. They could 

30 also join N- to C- termini. In any case, alone, they could 
only form a dimer, each end of which would be extensible by 
adding an appropriate chimer monomer (as shown for the 
simpler case in Figure 9) • 

In an alternative embodiment, the self -assembling 

35 moiety can be fused to the N-termini of the chimer. In a 
specific embodiment, the self -assembling moiety is fused to 
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at least a 10 amino acid portion of a T-even-like tail fiber 
prot in* 

A self assembling moiety that assembles into a 
heteroligomer can also be used. For example, if 
5 polymerization between beams is directed by the surface of a 
dimeric cross-0 surface, addition of a heterodimeric unit 
with one surface which does not promote further 
polymerization would be very useful to cap the penultimate 
unit and thus terminate polymerization- If the two types of 

10 coiled regions of the self -assembling moiety are much more 
attractive to each other that to themselves, then all of the 
dimers will be heterodimers. Such is the case for the 
N-terminal Jun and Fos leucine zipper regions. 

A further advantage to such heterodimeric units is 

15 the ability to stage polymerization and thus build one unit 
(or one surface in a 2D array) at a time. For example, 
suppose surface A attaches to B but neither attaches to 
itself ([A<->B] is used to symbolize this type of 
interaction) . Mix A/A and B/B c (B 0 is attached to a matrix 

20 for easy purification) . This will form B 0 /B-A/A. Now wash 
out A/A and add B/B. The construct is now B c /B-A/A-B/B. Now 
add A/A*. The construct is now BJB-A/A-B/B-A/Ao and no more 
beams can be added. There are of course many other 
possibilities • 

25 

*PPLIC*TIQM8 

The uses of the nanostructures of the present 
invention are manifold and include applications that require 
highly regular, well-defined arrays of fibers, cages, or 

30 solids, which may include specific attachment sites that 
allow them to associate with other materials. 

In one embodiment, a three-dimensional hexagonal 
array of tubes is used as a molecular sieve or filter, 
providing regular vertical pores of precise diameter for 

35 selective separation of particl s by size. Such filters can 
b used for sterilization of solutions (i*e., to rem ve 
microorganisms or viruses) , or as a series of 
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mol cular-w ight cut-off filt rs. In this case, the protein 
components of th pores ©ay be modif i d so as to provid 
sp cific surface properti s (i. . , hydrophilicity or 
hydrophobicity, ability to bind specific ligands, etc.). 
5 Among the advantages of this type of filtration device is the 
uniformity and linearity of pores and the high pore to matrix 
ratio. 

In another embodiment, long one-dimensional fibers 
are incorporated, for example, into paper or cement or 

10 plastic during manufacture to provide added wet and dry 
tensile strength. 

In still another embodiment, different 
nanostructure arrays are impregnated into paper and fabric as 
anti- counterfeiting markers. In this case, a simple 

15 color-linked antibody reaction (such as those commercially 
available in kits) is used to verify the origin of the 
material. Alternatively, such nanostructure arrays could 
bind dyes or other substances, either before or after 
incorporation to color the paper or fabrics or modify their 

20 appearance or properties in other ways. 

KITS 

The invention also provides kits for making 
nanostructures, comprising in one or more containers the 
25 chimers and deletion constructs of the invention. For 
example, one such kit comprises in one or more containers 
purified gp35 and purified gp36-34 chimer. Another such kit 
comprises purified gp37-36 chimer. 

The following examples are intended to illustrate 
30 the present invention without limiting its scope. 

In the examples below, all restriction enzymes, 
nucleases, ligases, etc. are commercially available from 
numerous commercial sources, such as New England Biolabs 
(NEB) , Beverly, MA; Life Technologies (GIBCO-BRL) , 
35 Gaithersburg, MD; and Boehringer Mannheim Corp. (BMC) , 
Indianapolis, IN. 
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BSftMPMS I 

miQVi gQPgTRPQTlQN A**P gxpfrPggyoN pp IWTBRWiM*Y PgEBTgp p37 

The gen ncoding gp37 contains two sit s for the 
restriction enzyme Bgl II, the first cleavage occurring after 
5 nucleotide 293 and the second after nucleotide 1486 (the 
nucleotides are numbered from the initiator methionine codon 
ATG.) Thus, digestion of a DNA fragment encoding gp37 with 
Bglll, excision of the intervening fragment (nucleotides 
294- 1485) and re-ligation of the 5' and 3' fragments results 
10 in the formation of an internally deleted gp37, designated 
AP37, in which arginine-98 is joined with serine-497. 

The restriction digestion reaction mix contains: 
gp37 plasmid DNA (1 fig/ til) 2^1 
15 NEB buffer #2 (10X) ImI 

H 2 0 6m 1 

Bgl II (10 U/Ml) lMl 

The gp37 plasmid signifies a pT7-5 plasmid into which gene 37 
20 has been inserted in the multiple cloning site, downstream of 
a good ribosome binding site and of gene 57 to chaperon the 
dimerization. The reaction is incubated for Ih at 37 °C. 
Then, 89 /il of T4 DNA ligase buffer and 1 /il of T4 DNA ligase 
are added, and the reaction is continued at 16 °C for 4 hours* 
25 2 Ml of the stu I restriction enzyme are then added, and 

incubation continued at 37 °C for lh. (The Stu I restriction 
enzyme digests residual plasmids that were not cut by Bgl II 
in the first step, reducing their transf ormability by about 
100-fold.) 

30 The reaction mixture is then transformed into E. 

coli strain BL21 , obtained from Novagen, using standard 
procedures. The transformation mixture is plated onto 
nutrient agar containing 100 fig /ml ampicillin, and the plates 
are incubated overnight at 37 °C. 

35 Colonies that appear after ov might incubation are 

picked, and plasmid DNA is extracted and digested with Bgl II 
as above. Th r striction digests are resolved on 1% agarose 
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gels. A successful deletion is evidenced by the appearance 
aft r g 1 lectrophor sis of a n w DNA fragment of 4.2 kbp, 
representing the undel ted part of gene 37 which is still 
attached to the plasmid and which re-formed a Bglll site by 
S ligation. The 1.2 kbp DNA fragment bounded by Bglll sites in 
the original gene is no longer in the plasmid and so is 
missing from the gel. 

Plasmids selected for the predicted deletion as 
above are transformed into E. coli strain BL21(DE3). 

10 Transformants are grown at 30*C until the density (A**) of the 
culture reaches 0.6. IPTG is then added to a final 
concentration of 0.4 mM and incubation is continued at 30°C 
for 2h, after which the cultures are chilled on ice. 20 Ml 
of the culture is then removed and added to 20 jil of a 

15 two-fold concentrated "cracking buffer" containing 1% sodium 
dodecyl sulfate, glycerol, and tracking dye. 15 /il of this 
solution are loaded onto a 10% polyacrylamide gel; a second 
aliquot of 15 pi is first incubated in a boiling water bath 
for 3 min and then loaded on the same gel. After 

20 electrophoresis, the gel is fixed and stained. Expression of 
the deleted gp37 is evidenced by the appearance of a protein 
species migrating at an apparent molecular mass of 65-70,000 
daltons in the boiled sample. The extent of dimerization is 
suggested by the intensity of higher-molecular mass species 

25 in the unboiled sample and/ or by the disappearance of the 
65-70,000 dalton protein band. 

The ability of the deleted polypeptide to dimerize 
appropriately is directly evaluated by testing its ability to 
be recognized by an anti-P37 antiserum that reacts only with 

30 mature P37 dimers, using a standard protein immunoblotting 
procedure . 

An alternative assay for functional dimerization of 
the deleted P37 polypeptide (also referred to as AP37) is its 
ability to complement in vivo a T4 37" phage, by first 

35 inducing expression of the AP37 and then infecting with the 
T4 mutant, and detecting progeny phag . 
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A AP37 was prepared as described above, and found 
capable of compl menting a T4 37" phag in vivo. 

HHUfflB 2 

DESIGN, CONSTRUCT! OK AND EXPRESSION OF A CP3 7-36 CHIMBR 

The starting plasmid for this construction is one 
in which the gene encoding gp37 is cloned immediately 
upstream (i.e., 5') of the gene encoding gp36. The plasmid 
is digested with Hae III, which deletes the entire 3' region 
of gp37 DNA downstream of nucleotide 724 to the 3' terminus, 
and also removes the 5' end of gp36 DNA from the 5' terminus 
to nucleotide 349. The reaction mixture is identical to that 
described in Example 1, except that a different plasmid DNA 
is used, and the enzyme is Haelll. Ligation using T4 DNA 
ligase, bacterial transformation, and restriction analysis 
are also performed as in Example 1. In this case, excision 
of the central portion of the gene 37-36 insert and 
religation reveals a novel insert of 346 in-frame codons, 
which is cut only once by Haelll (after nucleotide 725) . The 
resulting construct is then expressed in E. coli BL21(DE3) as 
described in Example 1. 

Successful expression of the gp37-3 6 chimer is 
evidenced by the appearance of a protein product of about 
35,000 daltons. This protein will have the first 242 
N-terminal amino acids of gp37 fused to the final 104 
C-terminal amino acids of gp36 (numbered 118-221.) The 
utility of this chimer depends upon its ability to dimerize 
and attach end-to-end. That is, carboxy termini of said 
polypeptide will have the capability of interacting with the 
amino terminus of the P37 protein dimer of bacteriophage T4 
and to form an attached dimer, and the amino terminus of the 
dimer of said polypeptide will have the capability of 
interacting with other said chimer polypeptides. This 
property can be tested by assaying whether introduction of 
AP37 initiates dimer ization and polymerization. 
Alternatively, polyclonal antibodies specific to P36 dimer 
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may be us d to detect P36 subsequ nt to initiation of 
dimerization by AP37. 

A gp37-36 chimer was prepar d similarly to th 
procedures described above, except that the restriction 
5 enzyme TagI was used instead of Haelll. Briefly, the 5' 

fragment resulting from TagI digestion of gene 37 was ligated 
to the 3' fragment resulting from TaqI digestion of gene 36. 
This produced a construct encoding a gp37-36 chimer in which 
amino acids 1-48 of gp37 were fused to amino acids 100-221 of 
10 gp36. This construct was expressed in E. coll BL21(DE3), and 
the chimer was detected as an 18 kD protein. This gp37-36 
chimer was found to inhibit the growth of wild type T4 when 
expression of the gp37-36 chimer was induced prior to 
infection (in an In vitro phage inhibition assay) . 

15 

EXAMPLE 3 

MUTATION OF THE GP37-36 CHIMER 
TO ?yop?Cg gOKP^gMgWARY BjJ?PRgggQRB 

The goal of this construction is to produce two 

20 variants of a dimerizable P37-36 chimer: One in which the N- 
terminus of the polypeptide is mutated (A, designated 
*P37-36) and one in which the Oterminus of the polypeptide 
is mutated (B, designated P37-36*) . The reguirement is that 
the mutated *P37 N-terminus cannot form a joint with the 

25 wild-type P36 C- terminus, but only with the mutated *P36 
N-terminus. The rationale is that A and B each cannot 
polymerize independently (as the parent P37-36 protein can) , 
but can only associate with each other sequentially (i.e., 
P37-36* + *P37-36 — > P37-36* — *P37-36) . 

30 A second construct, *p37-P36*, is formed by 

recombining *P37-36 and P37-36* in vitro. When the monomers 
*gp37-36* and gp37-36 are mixed in the presence of P37 
initiator, gp37-36 would dimerize and polymerize to 
(P37-36)n; similarly, *P37 would only catalyze the 

35 polymerization of gp37-36* to (*P37-36*)n. In this case, 
th two chimers could b of differ nt size and differ nt 
primary sequence with diff rent potential side-group 
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and positive colonies are used as source of plasmid for the 
next step. 

Several of these mutat d plasmids are recovered and 
subjected to a second round of mutagenesis, this time using 
5 doped oligonucleotides that introduce random mutations into 
the N-terminal region of gp37 present on the same plasmid. 
Again, the (now doubly) mutagenized plasmids are transformed 
into the supo strain of E. coli and transf ormants are 
infected with the mutant T4 phage. At this stage, bacterial 

10 plates are screened for the re-appearance of "nibbled" 

colonies. A nibbled colony at this stage indicates that the 
phage has replicated by virtue of suppression of the 
non-functional gp36* mutation (s) by the *P37 mutation. In 
other words, such colonies must contain novel *P37 

15 polypeptides that have now acquired the ability to interact 
with the P36* proteins encoded on the same plasmid. 

The *P37-36 and P37-36* paired suppressor chimers 
(A and B as above) are then constructed in the same manner as 
described in Example 2. In this case, however, *P37 is used 

20 in place of wild type P37 and P36* is used in place of wild 
type P36. A *P37-36* chimer can now be made by restriction 
of *P37-36 and P37-36* and religation in the recombined 
order. The *P37-36* can be mixed with the P37-36 chimer, and 
the polymerization of each can be accomplished independently 

25 in the presence of the other. This is useful when the 

rod-like central portion of these chimers have been modified 
in different ways. 

HEMgLJB 4 

30 ppeiQH, gQPSTRTCTIQff AM> gSPRgsPTON PF h qp36-34 CglffgR 
The starting plasmid for this construction is one 
in which the vector containing gene 57 and the gene encoding 
gp36 is cloned immediately upstream (i.e., 5') of the gene 
~ encoding gp34. The plasmid is digested with Ndel, which cuts 
35 after bp 219 of g ne 36 and after bp 2594 of gene 34, thereby 
del ting the final 148 C-terminal cod ns fr m the pg3 6 moiety 
and the first 865 N-terminal codons from the gp34 moiety. 
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The reaction mixture is identical t that described in 
Exampl 1, except that a different plasmid DNA is used, and 
th enzyme used is Ndel (NEB) . Ligation using T4 DNA ligase, 
bacterial transformation, and restriction analysis are also 
5 performed as in Example 1. This results in a new hybrid gene 
encoding a protein of 497 amino acids (73 N-terminal amino 
acids of gp36 and 424 C-terminal amino acids of gp34, 
numbered 866-1289.) 

As an alternative, the starting plasmid is cut with 

10 SphI at bp 648 in gene 34, and the Exo-Size Deletion Kit 
(NEB) is used to create deletions as described above. 

The resulting construct is then expressed in 
E. coli BL21(DE3) as described in Example 1. Successful 
expression of the gp36-34 chimer is evidenced by the 

15 appearance of a protein product of about 55,000 daltons. 
Preferably, the amino termini of the polypeptide homodimer 
have the capability of interacting with the gp35 protein, and 
then the car boxy termini have the capability of interacting 
with other attached gp35 molecules. Successful formation of 

2 0 the dimer can be detected by reaction with anti-P3 6 

antibodies or by attachment of gp35 or by the in vitro phage 
inhibition assay described in Example 2. 

BMfl&B i 

25 ISOLATION OP THERMOLABILE PROTE INS FOR SELF-ASSEMBLY 

Thermolabile structures can be utilized in 
nanostructures for: a) initiation of chimer polymerization 
(e.g., gp37-36) at low temperature and subsequent 
inactivation of and separation from the initiator at high 

30 temperature; b) initiation of angle formation between P36 and 
gp35 (e.g., variants of gp35 that have thermolabile 
attachment sites for P36 N-termini or P34 C-termini, a 
variant P36 that forms a thermolabile attachment to gp35, and 
a variant P34 with a thermolabile C-terminal attachment 

35 site.) Thermolability may be reversible, permitting 
reattachment of the appropriate termini when the lower 
t mperature is r stored, or it may be irreversible. 
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To create a variant gp37 that permits heat indue d 
8 paration of the P36 — P37 junction, the 5' nd of gp37 DNA 
is randomly mutag nized using doped oligonucl otides as 
described above. The mutagenized DNA fragment is then 
S recombined into T4 phage by infection of the cell containing 
the mutagenized DNA by a T4 phage containing two amber 
mutations flanking the mutagenized region. Following a low- 
multiplicity infection, non-amber phage are selected at low 
temperature on B. coli Su° at 30°C. The progeny of these 
10 plaques are resuspended in buffered and challenged by heating 
at 60°C. At this temperature, wild-type tail fibers remain 
intact and functional, whereas the thermolabile versions 
release the terminal P37 units and thus render those phage 
non-infectious . 

IS At this stage, wild type phage are removed by: l) 

adsorbing the wild type phage to sensitive bacteria and 
sediment ing (or filtering out) the bacteria with the adsorbed 
wild type phage; or 2) reacting the lysate with anti-P37 
antibody, followed by immobilized Protein A and removal of 

20 adsorbed wild type phage. Either method leaves the 

noninfectious mutant phage particles in the supernatant fluid 
or filtrate, from which they can be recovered. The 
non-infectious phage lacking terminal P37 moieties (and 
probably the rest of the tail fibers as well) are then urea 

25 treated with 6M urea, and mixed with bacterial spheroplasts . 
to permit infection at low multiplicity whereupon they 
replicate at low temperature and release progeny. 
Alternatively, infectious phage are reconstituted by in vitro 
incubation of the mutant phage with wild type P37 at 30°C; 

30 this is followed by infection of intact bacterial cells using 
the standard protocol. The latter method of infection 
specifically selects mutant phage in which the thermolability 
of the P36-P37 junction is reversible. 

Using either method, the phage populations are 

35 subjected to multiple rounds of selection as above, after 
which individual phage particles are isolat d by plague 
purification at 30°C. Finally, the putative mutants are 
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valuated individually for the following characteristics: 
1) loss of inf ctivity after incubation at high temperatures 
(40-60°C), as measur d by a deer ase in titer; 2) loss of P37 
after incubation at high temperature, as measured by decrease 
S in binding of P37-specific antibody to phage particles; and 
3) morphological changes in the tail fibers after incubation 
at high temperatures, as assessed by electron microscopy* 
After mutants are isolated and their phenotypes 
confirmed, the P37 gene is sequenced* If the mutations 

10 localize to particular regions or residues, those sequences 
are targeted for site-directed mutagenesis to optimize the 
desired characteristics. 

Finally, the mutant gene 37 is cloned into 
expression plasmids and expressed individually in E. coli as 

15 in Example 1. The mutant P37 dimers are then purified from 
bacterial extracts and used in in vitro assembly reactions. 

In a similar fashion, mutant gp35 polypeptides can 
be isolated that exhibit a thermolabile interaction with the 
N-terminus of P3 6 or the c-terminus of P34 . For thermolabile 

20 interaction with P34, phage are incubated at high 

temperature, resulting in the loss of the entire distal half 
of the tail fiber (i.e., gp35-P36-P37) . The only difference 
in the experimental protocol is that, in this case, 1) random 
mutagenesis is performed over the entire gp35 gene; 2) wild- 

25 type phage (and distal half-fibers from thermolabile mutants) 
are separated from thermolabile mutant phage that have been 
inactivated at high temperature (but still have proximal half 
tail fibers attached) by precipitating both the distal half- 
fibers and the phage particles containing intact tail fibers 

30 with any of the anti-distal half tail-fiber antibodies 
followed by Staphylococcal A-protein beads; 3) the mutant 
phage remaining in the supernatant are reactivated by 
incubation at low temperature with bacterial extracts 
containing wild type intact distal half fibers; and 4) stocks 

35 of th rmolabile gene, 35 mutants grown at 30°C can be tested 
for reversible th rmolability by inactivation at 60°C and 
reincubation at 30°C. Inactivation is performed on a 
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concentrated suspension of phage, and reincubation at 30 °C is 
perform d either before or after dilution* If phage are 
successfully reactivated before , but not after, dilution, 
this indicates that their gp35 is reversibly thermolabile. 
5 To create a gene 36 mutation with a thermolabile 

gp35 — P36 linkage, the C-terminus of gene 36 is mutagenized 
as described above, and the mutant selected for 
reversibility. An alternative is to mutagenize gp35 to 
create a gene 35 mutant in which the gp35-P36 linkage will 
10 dissociate at 60 P C. In this case, incubation with anti-gp35 
antibodies can be used to precipitate the phage without 
P36-P37 and thus to separate them from the wild-type phage 
and distal half-tail fibers (P36-P37) , since the variant gp35 
will remain attached to P34. 

15 

ftSSgtfPEY of Qpg-pingNgioiBtf, RODS 

A* Simple Assembly: The P37-36 chimer described in 
Example 2 is capable of self-assembly, but requires a P37 

20 initiator to bind the first unit of the rod. Therefore, a 
P37 or a AP37 dimer is either attached to a solid matrix or 
is free in solution to serve as an initiator. If the 
initiator is, attached to a solid matrix, a thermolabile P37 
dimer is preferably used. Addition of an extract containing 

25 gp37-36, or the purified gp37-36 chimer, results in the 
assembly of linear multimers of increasing length. In the 
matrix-bound case, the final rods are released by a brief 
incubation at high temperature (40-60°C, depending on the 
characteristics of the particular thermolabile P37 variant.) 

30 The ratio of initiator to gp37-36 can be varied, 

and the size distribution of the rods is measured by any of 
the following methods: 1) Size exclusion chromatography; 
2) Increase in the viscosity of the solution; and 3) Direct 
measurement by electron microscopy. 

35 B. Staged assembly: The P37-36 variants *P37-36 

and P37-36* describ d in Exampl 3 cannot self-polymeriz . 
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This allows the staged assembly of rods of defined length, 
according to th following protocol: 

1. Attach initiator P37 (pr ferably 
thermolabile) to a matrix. 
5 2. Add excess *gp37-36 to attach and oligomerize 

as P37-36 homooligomers to the N-terminus of P37. 

3. Wash out unreacted *gp37-36 and flood with 

gp37-36*. 

4. Wash out unreacted gp37-36* and flood with 
10 excess *gp37-36. 

5. Repeat steps 2-4, n-1 times. 

6. Release assembly from matrix by brief 
incubation at high temperature as above. 

The linear dimensions of the protein rods in the 
15 batch will depend upon the lengths of the unit heterochimers 
and the number of cycles (n) of addition. This method has 
the advantage of insuring absolute reproducibility of rod 
length and a homogenous, monodisperse size distribution from 
one preparation to another. 

20 

BBMBLB 7 

STAGED ASSEMBLY OF POLYGONS 

The following assembly strategy utilizes gp35 as an 
angle joint to allow the formation of polygons. For the 

25 purpose of this example, the angle formed by gp35 is assumed 
to be 137*. The rod unit comprises the P36-34 chimer 
described in Example 4, which is incapable of 
self -polymerization. The P36-34 homodimer is made from a 
bacterial clone in which both gp36-34 and gp57 are expressed. 

30 The gp57 can chaperone the homodimerization of gp36-34 to 
P36-34. 

1. Initiator: The incomplete distal half fiber 
P36-37 is attached to a solid matrix by the P37 C-terminus. 
Thermolabile gp35 as described in Example 5 is then added to 
35 form the intact initiator. 
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2. Excess P36-34 chimer is added to attach a 
single P36-34. Following binding to the matrix via gp35, the 
unbound chimer is washed out. 

3. Wild-type (i.e., non-thermolabile) gp35 is then 
S added in excess. After incubation, the unbound material is 

washed out. 

4. Steps 2 and 3 are repeated 7-8 times. 

5. The assembly is released from the matrix by 
brief incubation at high temperature. 

10 The released polymeric rod, 8 units long, will 

form a regular 8-sided polygon, whose sides comprise the 
P36-34 dimer and whose joints comprise the wild- type gp35 
monomer. However, there will be some multimers of these 8 
units bound as helices. When a unit does not close, but 

IS instead adds another to its terminus, the unit cannot close 
further and the helix can build in either direction. The 
direction of the first overlap also determines the handedness 
of the helix. Ten (or seven) -unit rods may form helices more 
frequently than polygons since their natural angles are 144° 

20 (or 128,6°) . The likelihood of closure of a regular polygon 
depends not only on the average angle of gp35 but also on its 
flexibility, which can be further manipulated by genetic or 
environmental modification. 

The type of polygon that is formed using this 

25 protocol depends upon the length of rod units and the angl 
formed by the angle joint. For example, alternating "rod 
units of different sizes can be used in step 2. In addition, 
variant gp35 polypeptides that form angles different than the 
natural angle of 137° can be used, allowing the formation of 

30 different regular polygons. Furthermore, for a given polygon 
with an even number of sides and equal angles, the sides in 
either half can be of any size provided the two halves are 
symmetric. 

35 
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GCAGGTGGTG 


ATAAAATAAT 


CAACGTAGCT 


TTAGCTGATC 


GTACCGTAGG 


AACTG ACGG T 


120 


GTTAACGTTG 


ATTACTTAAT 


TCAAGAAAAC 


ACAGTTCAAC 


AGTATGATCC 


AACTCGTGGA 


180 


TATTTAAAAG 


ATTTTGTAAT 


CATTTATGAT 


AACCGCTTTT 


GGGCTGCTAT 


AAATGATATT 
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CCAAAACCAG 


CAGGAGCTTT 


TAATAGCGGA 


CGCTGGAGAG 


CATTACGTAC 


CG ATG CT AAC 
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TGGATTACGG 


TTTCATCTGG 


TTCATATCAA 


TTAAAATCTG 


GTGAAGCAAT 


TTCGG TT AAC 
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ACCGCAGCTG 


GAAATGACAT 


C ACG TTT ACT 


TTACCATCTT 


CTCCAATTGA 


TGGTGATACT 
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ATCGTTCTCC 


AAGATATTGG 
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GGAGTTAACC 
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TGTAGCTCCA 
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GTACAAAGTA 


TTGTAAACTT 


TAGAGGTGAA 


CAGGTACGTT 


CAGTACTAAT 


GACTCATCCA 
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AAGTCACAGC 


TAGTTTTAAT 


TTTTAGTAAT 


CGTCTGTGGC 


AAATGTATGT 


TGCTGATTAT 


600 


AGTAGAGAAG 


CTATAGTTGT 


AACACCAGCG 


AATACTTATC 


AAGCGCAATC 


CAACGATTTT 


660 


ATCGTACGTA 


GATTTACTTC 


TGCTGCACCA 


ATTAATGTCA 


AACTTCCAAG 


ATTTGCTAAT 
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CATGGCGATA 


TTATTAATTT 


CGTCGATTTA 
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GTTCCAACTG 
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AGATTCTTTA 
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1320 


AATGATTCTA 


CTAGAGCAAG 


ATTAGGCGTA 


ATTG CTTT AG 


CTACACAAGC 


TCAAGCTAAT 


1380 


GTCGATTTAG 


AAAATTCTCC 


ACAAAAAGAA 


TTAGCAATTA 


CTCCAGAAAC 


GTTAGCTAAT 

^* A «r Ow *» ***» *> 


1440 


CGTACTGCTA 


CAGAAACTCG 


CAGAGGTATT 


GCAAGAATAG 


CAACTACTGC 


TCAAGTGAAT 


1500 


GAGAACACCA 


CATTCTCTTT 


TGCTGATGAT 


ATTATCATCA 


CTCCTAAAAA 


GCTGAATGAA 


1560 


AGAACTGCTA 


CAGAAACTCG 


TAGAGGTGTC 


GCAGAAATTG 


CTACGCAGCA 


AGAAACTAAT 


1620 


GCAGGAACCG 


ATGATACTAC 


AATCATCACT 


CCTAAAAAGC 


TTCAAGCTCG 


TCAAGGTTCT 


1680 


GAATCATTAT 


CTGGTATTGT 


AACCTTTGTA 


TCTACTGCAG 


GTGCTACTCC 


AGCTTCTAGC 


1740 


CGTGAATTAA 


ATGGTACGAA 


TGTTTATAAT 


AAAAACACTG 


ATAATTTAGT 


TG TTTC ACCT 


1800 


AAAGCTTTGG 


ATCAGTATAA 


AGCTACTCCA 


ACACAGCAAG 


GTGCAGTAAT 


TTT AG C AG TT 


1860 


GAAAGTGAAG 


TAATTGCTGG 


ACAAAGTCAG 


CAAGGATGGG 


CAAATGCTGT 


TGTAACGCCA 


1920 


GAAACGTTAC 


ATAAAAAGAC 


ATCAACTGAT 


GGAAGAATTG 


GTTTAATTGA 


AATTGCTACG 


1980 


CAAAGTGAAG 


TTAATACAGG 


AACTGATTAT 


ACTCGTGCAG 


TCACTCCTAA 


AACTTTAAAT 


2040 


GACCGTAGAG 


CAACTGAAAG 


TTTAAGTGGT 


ATAGCTGAAA 


TTGCTACACA 


AGTTGAATTC 


2100 
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GACGCA6GCG 


TCGACGATAC 


TCGTATCTCT 


AGTACTGATC 


GTACTTCTGT 


TGTTGCTCTA 


GACCATTATA 


CACTTAATAT 


TCTTGAAGCA 


GCTACGCAGG 


TCGAAGCTGC 


TGCGGGAACA 


CTTTTAGGTA 


CTAAATCTAC 


TGAAGCGCAA 


GAAACTGTGA 


CTGGAACGTC 


AGCAAATACT 


GCGCAGAGTG 


AACCTACTTG 


GGCAGCTACT 


TCTGGTTCAA 


TTACATTCGT 


TGGTAATGAT 


TATGAGAAAA 


ATAGCTATGC 


GGTATCACCA 


TTGCCACTAA 


AAGCAAAAGC 


TGCTGATACA 


TTCATTCGTA 


GGGATATTGC 


ACAGACGGTT 


AATCTGAGTG 


CCCCTCTTGT 


ATCATCTAGT 


AATAGAACAT 


TTACCATCCG 


TAATACAGGA 


CCTGCATCCG 


GGGCAAATCC 


TGCACAGTCA 


GGCGGCGGTA 


GTGATACGAC 


CCGTTCGACA 


CACTTTTATT 


CTCAACGTAA 


TAAAGACGGT 


ATGCCAATAA 


ACATTAATGC 


TTCCGGTTTG 


CGTTCAGTTA 


CAGCCAATGG 


TGAATTCATC 


AACGGTGATT 


ACGGATTCTT 


TATTCGTAAT 


GCAGCCGGTG 


ATCAGACTGG 


TGGTTTTAAT 


TCCGGTCAGA 


TTACAATTGG 


TGAAGGCTTA 


GGCGGTTTAA 


CTGTTAACTC 


GAGAATTCGT 


ACCCGTGCGC 


CAACATCTGA 


TACTGTAGGA 


ACTTATAACC 


AGTTCCCGGG 


TTATTTTAAA 


CTTCCATACT 


TAGAACGTGG 


CGAAGAAGTT 


AACACACTTG 


ATTCGCTTTA 


CCAAGATTGG 


ACCACTCGCT 


GGACACGTAC 


ATGG C AG AAA 


GTATTTGACG 


GAGGTAACCC 


TCCTCAACCA 


GCTACAATGG 


GGAATCTTAC 


TATTCGTGAT 


CCTGACCCAG 


TGAATAAAAC 


GGTTAAATTT 


AATTTATGGC 


CGAGATTTGG 


ACAAGGATAT 


CAGTAAGATA 


TAAAATAAGT 


ATAGCGGGTT 


ATGTTAAATT 


TCAGGATAAT 


CCTGTAGGAA 


AGAGTTTTTG 


ACCCTTCCAC 


CGGAGCATTA 



ACACCATTAA AAATTAAAAC 
TCTGGATTAG TTGAATCAGG 
AATGAGACAC AACGTGGTAC 
TTAGATAATG TTTTAATAAC 
GAGGGTGTTA TTAAAGTTGC 
GCTG TATCTC CAAAAAATTT 
ACTGCAATAA GAGGTTTTGT 
ACAGTCGGTT CTACCCAAGA 
TATGAATTAA ACCGTGTATT 
AATTTATTGG ATGGTCTAGA 
AATGGTTCAC TAACCTTAAC 
ACTGGTGAAT TTGGTGGTTC 
GCCCCGACTA GTATCGTTTT 
ATGAGTATTC GTGTATGGGG 
GTGTTTGAAG TTGGCGATGA 
AATATAGCGT TTAACATTAA 
ATGAATGTGA ATGGCACTGC 
AG C AAGTCTG CAAATGCTTT 
GATGCCTCTA ATACCTATTT 
GGATTACGCC CATTATTAAT 
ATCATTGCCA AAGGTGTTAC 
TCTCAGGGTA CTAAAACATC 
TTCTGGTCAA TCGATATTAA 
ATGGTTGAAA AAACTAATGA 
AAATCTCCTG GTACACTGAC 
ATTACTTATC CAACGACGCC 
ACCAAAAACT CTTGGTCAAG 
TCTGATATCG GTGCTTTACC 
TTCTTGCGAA TTGGTAATGT 
GAATGGGTTG AATAAGAGGT 
GTCCAAACGC CATTTTATCG 
CTTGCCCGCT TTCTACAGCA 
GTCAAACATT TAGGCGCAGG 
GTTGATAGTA AGTCATATGC 
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CAGATTTAAT 


2160 


AACTCTCTGG 


2220 


ACTTCGTGTA 


2280 


TCCTAAAAAG 


2340 


AACTCAGTCT 


2400 


AAAATGGATT 


2460 


TAAAACTTCA 


2520 


TTTAGAACTG 


2580 


AGCAAATTAT 


2640 


TTCATCTCAG 


2700 


CCAACAAACG 


2760 


ATTGGCCGCT 


2820 


CGAAAAAGGT 


2880 


TAACCAATTT 


2940 


CACATCTCAT 


3000 


TGGTACTGTA 


3060 


AACATTCGGT 


3120 


TAGAGCAATA 


3180 


TTTGCTCACT 


3240 


TAATAATCAA 


3300 


TATAAATTCA 


3360 


TGATTTATAT 


3420 


TGATTCAGCC 


3460 


AG TG ACTGGG 


3540 


TCAGTTTGGT 


3600 


AGAAGCGCGT 


3660 


TTTTGTTCAG 


3720 


ATCTGATAAT 


3780 


TCGCATTGTT 


3840 


ATTATGGAAA 


3900 


GAAAGTAATT 


3960 


GGACCATCAT 


4020 


CCTTCATTTA 


4080 


TTTTTCGACT 


4140 
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TCAAATGATA 


CTACATCAGC 


TGCTTTTGTT 


AGTTTTCATG 


AATTCTTTGA 


CGAATAATCG 


4200 


AATTGTTG CT 


ATATTAACTA 


GTGGAAAGGT 


TAATTTTCCT 


CCTGAAGTAG 


TAT CTTGG TT 


4260 


AAGAACCGCC 


GGAACGTCTG 


CCTTTCCATC 


TGATTCTATA 


TTGTCAAGAT 


TTGACGTATC 


4320 


ATATGCTGCT 


TTTTATACTT 


CTTCTAAAAG 


AGCTATCGCA 


TTAGAGCATG 


TTAAACTGAG 




TAATAGAAAA 


AGCACAGATG 


ATTATCAAAC 


TATTTTAGAT 


GTTGTATTTG 


A GAG TTT AG A 


4440 


AGATGTAGGA 


GCTACCGGGT 


TTCCAAGAAG 


AACGTATGAA 


AGTGTTGAGC 


AATTCATGTC 


4500 


GGCAGTTGGT 


GGAACTAATA 


ACGAAATTGC 


GAGATTGCCA 


ACTTCAGCTG 


CT AT AAG T AA 


4560 


ATTATCTGAT 


TATAATTTAA 


TTCCTGGAGA 


TGTTCTTTAT 


CTTAAAGCTC 


AGTTATATGC 


4620 


TGATG CTG AT 


TTACTTGCTC 


TTGGAACTAC 


AAATATATCT 


ATCCGTTTTT 


ATAATGCATC 


4680 


TAACGGATAT 


ATTTCTTCAA 


CACAAGCTGA 


ATTTACTGGG 


CAAGCTGGGT 


CATGGGAATT 


4740 


AAAGGAAGAT 


TATGTAGTTG 


TTCCAGAAAA 


CGCAGTAGGA 


TTTACGATAT 


ACGCACAGAG 


4800 


AACTGCACAA 


GCTGGCCAAG 


GTGGCATGAG 


AAATTTAAGC 


T TTT CTG AAG 


TATCAAGAAA 


4860 


TGGCGGCATT 


TCGAAACCTG 


CTGAATTTGG 


CGTCAATGGT 


ATTCG TGTTA 


ATTATATCTG 


*» J A w 


CGAATCCGCT 


TCACCTCCGG 


ATATAATGGT 


ACTTCCTACG 


CAAGCATCGT 


CTAAAACTGG 

w" a nnnnv x w\7 


4980 


TAAAGTGTTT 


GGGCAAGAAT 


TTAGAGAAGT 


TTAAATTGAG 


GGACCCTTCG 


GGTTCCCTTT 




TTCTTTATAA 


ATACTATTCA 


AATAAAGGGG 


CATACAATGG 


CTGATTTAAA 


AGTAGGTTCA 


5100 


ACAACTGGAG 


GCTCTGTCAT 


TTGGCATCAA 


GGAAATTTTC 


CATTGAATCC 


AGCCGGTGAC 


5160 


GATGTACTCT 


ATAAATCATT 


TAAAATATAT 


TCAGAATATA 


ACAAACCACA 


AG CTG CTG AT 


5220 


AACGATTTCG 


TTTCTAAAGC 


TAATGGTGGT 


ACTTATGCAT 


CAAAGGTAAC 


ATTTAACGCT 




GGCATTGAAG 


TCCCATATGC 


TCCAAACATC 


ATGAGCCCAT 


GCGGGATTTA 


TGGGGGTAAC 




GGTGATGGTG 


CTACTTTTGA 


TAAAGCAAAT 


ATCGATATTG 


TTTCATGGTA 


TGGCGTAGGA 


5400 


TTTAAATCGT 


CATTTGGTTC 


AACAGGCCGA 


ACTGTTGTAA 


TTAATACACG 


CAATGGTGAT 


5460 


ATTAACACAA 


AAGGTGTTGT 


GTCGGCAGCT 


GGTCAAGTAA 


GAAGTGGTGC 


GGCTGCTCCT 


5520 


ATAGCAGCGA 


ATGACCTTAC 


TAGAAAGGAC 


TATGTTGATG 


GAGCAATAAA 


TACTGTTACT 


5580 


GGAAATGCAA 


ACTCTAGGGT 


GCTACGGTCT 


GGTGACACCA 


TGACAGGTAA 


TTT AAC AG CG 


5640 


CCAAACTTTT 


TCTCGCAGAA 


TCCTGCATCT 


CAACCCTCAC 


ACGTTCCACG 


ATTTGACCAA 


5700 


ATCGTAATTA 


AGGATTCTGT 


TCAAGATTTC 


GGCTATTATT 


AAGAGGACTT 


ATGGCTACTT 


5760 


TAAAACAAAT 


ACAATTTAAA 


AGAAGCAAAA 


TCGCAGGAAC 


ACGTCCTGCT 


GCTTCAGTAT 


5820 


TAGCCGAAGG 


TGAATTGGCT 


ATAAACTTAA 


AAGATAGAAC 


AATTTTT A CT 


AAAGATGATT 


5880 


CAGGAAATAT 


CATCGATCTA 


GGTTTTGCTA 


AAGGCGGGCA 


AGTTGATGGC 


AACGTTACTA 


5940 




TTTGAGATTA 


AATGGCGATT 


ATGTACAAAC 


AGGTGGAATG 


ACTGTAAACG 


6000 


GACCCATTGG 


TTCTACTGAT 


GGCGTCACTG 


GAAAAATTTT 


CAGATCTACA 


CAGGGTTCAT 


6060 


TTTATGGAAG 


AGCAACAAAC 


GATACTTCAA 


ATGCCCATTT 


ATGGTTTGAA 


AATGCCGATG 


6120 


GCACTGAACG 


TGGCGTTATA 


TATGCTCGCC 


CTCAAACTAC 


AACTGACGGT 


GAAATACGCC 


6180 
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TTAGGGTTAG ACAAGGAACA 
ATGGAGGCGA ATTTCAGGCT 
TTGOGGTTGA TACCGTTATT 
TGGTTAATTA TGTTTATCCT 
TTCGOGCTAA GTCCGGTGGT 
CTGATGAAGT TTCTTGGTGG 
ACGATGGCAG AATGATTATC 
CGTCTAGTGA TTATGGCAAC 
CTGTAACTGG CTTGTCATAC 
CTGTTGCTTC TATTACTCCT 
CTGAGGACCA AGGCGCAACT 
AAACACAAGC TGATAATAAC 
GCGGTAAAAT GAACCACTAT 
GTATGGAAAT TAACCCGGGT 
ACGCTGACGG AACTATTTCT 
CTAAATCTAA TAATACTGCG 
GGACTATCCA ATGGAACGGT 
AAGCATGGGG TAACTCATTT 
TATCAGATAG TCAAGGATAT 
CTATTGGACG TATTGAAGCT 
ACGGAAATTT TAGAGTTGTT 
GTTTGTTTGT CCAAGGTGGT 
ACGCACTGAG AATTTGGAAC 
TTTATATTAT TCCAACCAAT 
GACCTGTGAG AATAGGATTA 
TAGATCAAAA TAATGCTTTA 
GAATGCAATT GGGGCAGTCG 
CGGGTGCAGG TTCATTTGCT 
ATATTGATAG AACTGATGCT 
GCAATGGCTG CTATTCATTA 
ATGGCGGCGG AGATAACGGT 
TTAAAAACGG TGATTTTATT 
GAACTGGTAA TATCACTGGT 
CACTTAAAAC TGATATCATG 
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GGAAGCACTG 


CCAACAGTGA 


ATTCTATTTC 


CGCTCTATAA 


6240 


AACCGTATTT 


TAG CATCAGA 


TTCGTTAGTA 


ACAAAACGCA 


6300 


CATGATGCCA 


AAGCATTTGG 


ACAATATGAT 


TCTCACTCTT 


6360 


GGAACCGGTG 


AAACAAATGG 


TGTAAACTAT 


CTTCGTAAAG 


6420 


ACAATTTATC 


ATGAAATTGT 


TACTGCACAA 


ACAGGCCTGG 


6480 


TCTGGTGATA 


CACCAGTATT 


TAAACTATAC 


GGTATTCGTG 


6S40 


CGTAATAGCC 


TTGCATTAGG 


TACATTCACT 


ACAAATTTCC 


6600 


GTCGGTGTAA 


TGGGCGATAA 


GTATCTTGTT 


CTCGGCGACA 


6660 


AAAAAAACTG 


GTGTATTTGA 


TCTAGTTGGC 


GGTGGATATT 


6720 


GACAGTTTCC 


GTAGTACTCG 


T AAAGG TATA 


TTTGGTOGTT 


6780 


TGGATAATGC 


CTGGTACAAA 


TGCTGCTCTC 


TTGTCTGTTC 


6840 


AATGCTGGAG 


ACGGACAAAC 


CCATATCGGG 


TACAATGCTG 


6900 


TTCCGTGGTA 


CAGGTCAGAT 


GAATATCAAT 


ACCCAACAAG 


6960 


ATTTTGAAAT 


TGGTAACTGG 


CTCTAATAAT 


GTACAATTTT 


7020 


TCCATTCAAC 


CTATTAAATT 


AGATAACGAG 


ATATTTTTAA 


7080 


GG TCTTAAAT 


TTGGAGCTCC 


TAG CCAAG TT 


GATGGCACAA 


7140 


GGTACTCGCG 


AAGGACAGAA 


TAAAAACTAT 


GTGATTATTA 


7200 


AATGCCACTG 


GTGATAGATC 


TCGCGAAACG 


GTTTTCCAAG 


7260 


TATTTTTATG 


CTCATCGTAA 


AGCTCCAACC 


GGCGACGAAA 


7320 


CAATTTGCTG 


GGGATGTTTA 


TGCTAAAGGT 


ATTATTGCCA 


7380 


GGGTCAAGCG 


CTTTAGCCGG 


CAATGTTACT 


ATGTCTAACG 


7440 


TCTTCTATTA 


CTGGACAAGT 


TAAAATTGGC 


GGAACAGCAA 


7500 


GCTGAATATG 


GTGCTATTTT 


CCGTCGTTCG 


GAAAGTAACT 


7560 


CAAAATGAAG 


GAGAAAGTGG 


AGACATTCAC 


AGCTCTTTGA 


7620 


AACGATGGCA 


TGGTTGGGTT 


AGGAAGAGAT 


TCTTTTATAG 


7680 


ACTACGATAA 


ACAGTAACTC 


TCGCATTAAT 


GCCAACTTTA 


7740 


GCATACATTG 


ATGCAGAATG 


TACTGATGCT 


GTTCGCCCGG 


7800 


TCCCAGAATA 


ATGAAGACGT 


CCGTGCGCCG 


TTCTATATGA 


7860 


AGTGCATATG 


TTCCTATTTT 


GAAACAACGT 


TATGTTCAAG 


7920 


GGGACTTTAA 


TTAATAATGG 


TAATTTCCGA 


GTTCATTACC 


7980 


TCTACAGGTC 


CACAGACTGC 


TGATTTTGGA 


TGGGAATTTA 


8040 


TCACCTCGCG 


ATTTAATAGC 


AGGCAAAGTC 


AGATTTGATA 


8100 


GGTTCTGGTA 


ATTTTGCTAA 


CTTAAACAGT 


ACAATTGAAT 


8160 


TCGAGTTACC 


CAATTGGTGC 


TCCGATTCCT 


TGGCCGAGTG 


8220 
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ATTCAGTTCC TGCTGCATTT GCTTTGATCG AAGGTCAGAC CTTTGATAAG TCCGCATATC 8280 

CAAAGTTAGC TGTTGCATAT CCTAGCGGTG TTATTCCAGA TATGCGCGGG CAAACTATCA 8340 

AGCGTAAACC AAGTGGTCGT GCTGTTTTGA GCG CTGAGGC AGATGGTGTT AAGG CTCAT A 8400 

GCCATAGTGC ATCGGCTTCA AGTACTGACT TAGGTACTAA AACCACATCA AGCTTTGACT 8460 

ATGGTACGAA GCGAACTAAC AGTACGGGTG GACACACTCA CTCTGGTAGT GGTTCTACTA 8520 

CCACAAATGG TGAGCACAGC CACTACATCG AGGCATGGAA TGGTACTGGT GTAGGTGGTA 8580 

ATAAGATGTC ATCATATGCC ATATCATACA GGGCGGGTGG GAGTAACACT AATGCAGCAG 8640 

GCAACCACAG TCACACTTTC TCTTTTGGGA CTAGCAGTGC TGGCGACCAT TCCCACTCTG 8700 

TAGCTATTGG TGCTCATACC CACACGGTAG CAATTGGATC ACATGGTCAT ACTATCACTG 8760 

TAAATAGTAC AGGTAATACA GAAAACACGG TTAAAAACAT TGCTTTTAAC TATATCGTTC 8820 

GTTTAGCATA A6GAGAGGGG CTTCGGCCCT TCTAA 8855 
(2) INFORMATION FOR SEQ ID NO: 2: 

(X) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1289 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p34 amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Glu lie Lys Arg Glu Phe Arg Ala Glu Asp Gly Leu Asp Ala 
15 10 15 

Gly Gly Asp Lys lie lie Asn Val Ala Leu Ala Asp Arg Thr Val Gly 
20 25 30 

Thr Asp Gly Val Asn Val Asp Tyr Leu lie Gin Glu Asn Thr Val Gin 
35 40 45 

Gin Tyr Asp Pro Thr Arg Gly Tyr Leu Lys Asp Phe Val lie lie Tyr 
50 * 55 60 

Asp Asn Arg Phe Trp Ala Ala lie Asn Asp lie Pro Lys Pro Ala Gly 
65 70 75 80 

Ala Phe Asn Ser Gly Arg Trp Arg Ala Leu Arg Thr Asp Ala Asn Trp 
85 90 95 

lie Thr Val Ser Ser Gly Ser Tyr Gin Leu Lys Ser Gly Glu Ala lie 
100 105 110 

Ser Val Asn Thr Ala Ala Gly Asn Asp lie Thr Phe Thr Leu Pro Ser 
115 120 125 

Ser Pro lie Asp Gly Asp Thr lie Val Leu Gin Asp lie Gly Gly Lys 
130 135 140 
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Pro Gly Val Asn Gin Val L u lie Val Ala Pro Val Gin Ser He Val 
145 150 155 160 

Asn Ph Arg Gly Glu Gin Val Arg Ser Val Leu Met Thr Hie Pro Lys 
165 170 175 

Ser Gin Leu Val Leu He Phe Ser Asn Arg Leu Trp Gin Met Tyr Val 
180 165 190 

Ala Asp Tyr Ser Arg Glu Ala He Val Val Thr Pro Ala Asn Thr Tyr 
195 200 205 

Gin Ala Gin Ser Asn Asp Phe He Val Arg Arg Phe Thr Ser Ala Ala 
210 215 220 

Pro He Asn Val Lys Leu Pro Arg Phe Ala Asn His Gly Asp He He 
225 230 235 240 

Asn Phe Val Asp Leu Asp Lys Leu Asn Pro Leu Tyr His Thr He Val 
245 250 255 

Thr Thr Tyr Asp Glu Thr Thr Ser Val Gin Glu Val Gly Thr His Ser 
260 265 270 

He Glu Gly Arg Thr Ser He Asp Gly Phe Leu Met Phe Asp Asp Asn 
275 280 285 

Glu Lys Leu Trp Arg Leu Phe Asp Gly Asp Ser Lys Ala Arg Leu Arg 
290 - 295 300 

He He Thr Thr Asn Ser Asn He Arg Pro Asn Glu Glu Val Met Val 
305 310 315 320 

Phe Gly Ala Asn Asn Gly Thr Thr Gin Thr He Glu Leu Lys Leu Pro 
325 330 335 

Thr Asn He Ser Val Gly Asp Thr Val Lys He Ser Met Asn Tyr Met 
340 345 350 

Arg Lys Gly Gin Thr Val Lys He Lys Ala Ala Asp Glu Asp Lys He 
355 360 365 

Ala Ser Ser Val Gin Leu Leu Gin Phe Pro Lys Arg Ser Glu Tyr Pro 
370 375 380 

Pro Glu Ala Glu Trp Val Thr Val Gin Glu Leu Val Phe Asn Asp Glu 
385 390 395 400 

Thr Asn Tyr Val Pro Val Leu Glu Leu Ala Tyr He Glu Asp Ser Asp 
405 410 415 

Gly Lys Tyr Trp Val Val Gin Gin Asn Val Pro Thr Val Glu Arg Val 
420 425 430 

Asp Ser Leu Asn Asp Ser Thr Arg Ala Arg Leu Gly Val He Ala Leu 
435 440 445 

Ala Thr Gin Ala Gin Ala Asn Val Asp Leu Glu Asn Ser Pro Gin Lys 
450 455 460 

Glu Leu Ala He Thr Pro Glu Thr Leu Ala Asn Arg Thr Ala Thr Glu 
465 470 475 480 

Thr Arg Arg Gly He Ala Arg He Ala Thr Thr Ala Gin Val Asn Gin 
485 490 495 

Asn Thr Thr Phe Ser Phe Ala Asp Asp He He II Thr Pro Lys Lys 
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500 505 510 

Leu Asn Clu Arg Thr Ala Thr Glu Thr Arg Arg Gly Val Ala Glu lie 
515 520 525 

Ala Thr Gin Gin Glu Thr Asn Ala Gly Thr Asp Asp Thr Thr lie II 
530 535 540 

Thr Pro Lys Lys Leu Gin Ala Arg Gin Gly Ser Glu Ser Leu Ser Gly 
545 550 555 560 

He Val Thr Phe Val Ser Thr Ala Gly Ala Thr Pro Ala Ser Ser Arg 
565 570 575 

Glu Leu Asn Gly Thr Asn Val Tyr Asn Lys Asn Thr Asp Asn Leu Val 
580 585 590 

Val Ser Pro Lys Ala Leu Asp Gin Tyr Lys Ala Thr Pro Thr Gin Gin 
595 600 605 

Gly Ala Val He Leu Ala Val Glu Ser Glu Val He Ala Gly Gin Ser 
610 615 620 

Gin Gin Gly Trp Ala Asn Ala Val Val Thr Pro Glu Thr Leu His Lys 
625 630 635 640 

Lys Thr Ser Thr Asp Gly Arg He Gly Leu He Glu He Ala Thr Gin 
645 650 655 

Ser Glu Val Asn Thr Gly Thr Asp Tyr Thr Arg Ala Val Thr Pro Lys 
660 " 665 ~ 670 

Thr Leu Asn Asp Arg Arg Ala Thr Glu Ser Leu Ser Gly He Ala Glu 
675 680 685 

He Ala Thr Gin Val Glu Phe Asp Ala Gly Val Asp Asp Thr Arg He 
690 695 700 

Ser Thr Pro Leu Lys He Lys Thr Arg Phe Asn Ser Thr Asp Arg Thr 
705 710 715 720 

Ser Val Val Ala Leu Ser Gly Leu Val Glu Ser Gly Thr Leu Trp Asp 
725 730 735 

His Tyr Thr Leu Asn lie Leu Glu Ala Asn Glu Thr Gin Arg Gly Thr 
740 745 750 

Leu Arg Val Ala Thr Gin Val Glu Ala Ala Ala Gly Thr Leu Asp Asn 
755 760 765 

Val Leu He Thr Pro Lys Lys Leu Leu Gly Thr Lys Ser Thr Glu Ala 
770 775 780 

Gin Glu Gly Val He Lys Val Ala Thr Gin Ser Glu Thr Val Thr Gly 
785 790 795 800 

Thr Ser Ala Asn Thr Ala Val Ser Pro Lys Asn Leu Lys Trp He Ala 
805 810 815 

Gin Ser Glu Pro Thr Trp Ala Ala Thr Thr Ala He Arg Gly Phe Val 
820 825 830 

Lys Thr Ser Ser Gly Ser He Thr Phe Val Gly Asn Asp Thr Val Gly 
835 840 845 

Ser Thr Gin Asp Leu Glu Leu Tyr Glu Lys Asn Ser Tyr Ala Val Ser 
850 855 860 

-46- 



BNSDOCID: <WO 961 1947A1_I_> 



WO 96/11947 



PCTYUS95/13023 



Pro Tyr Glu Leu Asn Arg Val Leu Ala Asn Tyr Leu Pro Leu Lys Ala 
865 870 875 880 

Lys Ala Ala Asp Thr Asn L u Leu Asp Gly Leu Asp Ser Ser Gin Phe 
885 890 895 

He Arg Arg Asp He Ala Gin Thr Val Asn Gly Ser Leu Thr Leu Thr 
900 905 910 

Gin Gin Thr Asn Leu Ser Ala Pro Leu Val Ser Ser Ser Thr Gly Glu 
915 920 925 

Phe Gly Gly Ser Leu Ala Ala Asn Arg Thr Phe Thr He Arg Asn Thr 
930 935 940 

Gly Ala Pro Thr Ser He Val Phe Glu Lys Gly Pro Ala Ser Gly Ala 
945 950 955 960 

Asn Pro Ala Gin Ser Met Ser lie Arg Val Trp Gly Asn Gin Phe Gly 
965 970 975 

Gly Gly Ser Asp Thr Thr Arg Ser Thr Val Phe Glu Val Gly Asp Asp 
980 985 990 

Thr Ser His His Phe Tyr Ser Gin Arg Asn Lys Asp Gly Asn He Ala 
995 1000 1005 

Phe Asn He Asn Gly Thr Val Met Pro He Asn He Asn Ala Ser Gly 
1010 1015 1020 

Leu Met Asn Val Asn Gly Thr Ala Thr Phe Gly Arg Ser Val Thr Ala 
1025 1030 1035 1040 

Asn Gly Glu Phe He Ser Lys Ser Ala Asn Ala Phe Arg Ala He Asn 
1045 1050 1055 

Gly Asp Tyr Gly Phe Phe He Arg Asn Asp Ala Ser Asn Thr Tyr Phe 
1060 1065 1070 

Leu Leu Thr Ala Ala Gly Asp Gin Thr Gly Gly Phe Asn Gly Leu Arg 
1075 1080 1085 

Pro Leu Leu He Asn Asn Gin Ser Gly Gin He Thr He Gly Glu Gly 
1090 1095 1100 

Leu He He Ala Lys Gly Val Thr He Asn Ser Gly Gly Leu Thr Val 
1105 1110 1115 1120 

Asn Ser Arg He Arg Ser Gin Gly Thr Lys Thr Ser Asp Leu Tyr Thr 
1125 1130 1135 

Arg Ala Pro Thr Ser Asp Thr Val Gly Phe Trp Ser He Asp He Asn 
1140 1145 1150 

Asp Ser Ala Thr Tyr Asn Gin Phe Pro Gly Tyr Phe Lys Met Val Glu 
1155 1160 1165 

Lys Thr Asn Glu Val Thr Gly Leu Pro Tyr Leu Glu Arg Gly Glu Glu 
1170 1175 1180 

Val Lys Ser Pro Gly Thr Leu Thr Gin Phe Gly Asn Thr Leu Asp Ser 
1185 1190 1195 1200 

Leu Tyr Gin Asp Trp II Thr Tyr Pro Thr Thr Pro Glu Ala Arg Thr 
1205 1210 1215 

Thr Arg Trp Thr Arg Thr Trp Gin Lys Thr Lys Asn Ser Trp Ser Ser 
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1220 1225 1230 

Phe Val Gin Val Phe Asp Gly Gly Asn Pro Pro Gin Pro Ser Asp II 
1235 1240 1245 

Gly Ala Leu Pro Ser Asp Asn Ala Thr Met Gly Asn Leu Thr lie Arg 
1250 1255 1260 

Asp Phe Leu Arg lie Gly Asn Val Arg lie Val Pro Asp Pro Val Asn 
1265 1270 1275 1280 

Lys Thr Val Lye Phe Glu Trp Val Glu 
1285 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 
(V) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ORF X amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Glu Lys Phe Met Ala Glu lie Trp Thr Arg He Cys Pro Asn Ala 
15 10 15 

He Leu Ser Glu Ser Asn Ser Val Arg Tyr Lys He Ser He Ala Gly 
20 25 30 

Ser Cys Pro Leu Ser Thr Ala Gly Pro Ser Tyr Val Lys Phe Gin Asp 
35 40 45 

Asn Pro Val Gly Ser Gin Thr Phe Arg Arg Arg Pro Ser Phe Lys Ser 
50 55 " 60 

Phe 
65 

<2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 295 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p35 amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Leu Phe Arg Leu Gin Met He Leu His Gin Leu Leu Leu Leu Val 
15 10 15 
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Phe Met Asn Ser Leu Thr Asn Asn Arg lie Val Ala lie Leu Thr Ser 
20 25 30 

Gly Lys Val Asn Ph Pro Pro Glu Val Val Ser Trp Leu Arg Thr Ala 
35 40 45 

Gly Thr Ser Ala Phe Pro Ser Asp Ser lie Leu Ser Arg Phe Asp Val 
50 55 60 

Ser Tyr Ala Ala Phe Tyr Thr Ser Ser Lys Arg Ala lie Ala Leu Glu 
65 70 75 80 

His Val Lys Leu Ser Asn Arg Lys Ser Thr Asp Asp Tyr Gin Thr lie 
85 90 95 

Leu Asp Val Val Phe Asp Ser Leu Glu Asp Val Gly Ala Thr Gly Phe 
100 105 110 

Pro Arg Arg Thr Tyr Glu Ser Val Glu Gin Phe Met Ser Ala Val Gly 
115 120 125 

Gly Thr Asn Asn Glu lie Ala Arg Leu Pro Thr Ser Ala Ala He Ser 
130 135 140 

Lys Leu Ser Asp Tyr Asn Leu He Pro Gly Asp Val Leu Tyr Leu Lys 
145 150 155 160 

Ala Gin Leu Tyr Ala Asp Ala Asp Leu Leu Ala Leu Gly Thr Thr Asn 
165 170 175 

He Ser He Arg Phe Tyr Asn Ala Ser Asn Gly Tyr He Ser Ser Thr 
180 185 190 

Gin Ala Glu Phe Thr Gly Gin Ala Gly Ser Trp Glu Leu Lys Glu Asp 
195 200 205 

Tyr Val Val Val Pro Glu Asn Ala Val Gly Phe Thr He Tyr Ala Gin 
210 215 220 

Arg Thr Ala Gin Ala Gly Gin Gly Gly Met Arg Asn Leu Ser Phe Ser 
225 230 235 240 

Glu Val Ser Arg Asn Gly Gly He Ser Lys Pro Ala Glu Phe Gly Val 
245 250 255 

Asn Gly lie Arg Val Asn Tyr lie Cys Glu Ser Ala Ser Pro Pro Asp 
260 265 270 

He Met Val Leu Pro Thr Gin Ala Ser Ser Lys Thr Gly Lys Val Phe 
275 280 285 

Gly Gin Glu Phe Arg Glu Val 
290 295 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 221 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 
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(B) CLONE: p36 amino acid 

(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 5: 

M t Ala Asp Leu Lye Val Gly Ser Thr Thr Gly Gly Ser Val lie Trp 
15 10 15 

His Gin Gly Asn Phe Pro Leu Asn Pro Ala Gly Asp Asp Val Leu Tyr 
20 25 30 

Lys Ser Phe Lye lie Tyr Ser Glu Tyr Asn Lys Pro Gin Ala Ala Asp 
35 40 45 

Asn Asp Phe Val Ser Lys Ala Asn Gly Gly Thr Tyr Ala Ser Lys Val 
50 55 60 

Thr Phe Asn Ala Gly lie Gin Val Pro Tyr Ala Pro Asn lie Met Ser 
65 70 75 80 

Pro Cys Gly lie Tyr Gly Gly Asn Gly Asp Gly Ala Thr Phe Asp Lys 
85 90 95 

Ala Asn lie Asp lie Val Ser Trp Tyr Gly Val Gly Phe Lys Ser Ser 
100 105 * * 110 

Phe Gly Ser Thr Gly Arg Thr Val Val lie Asn Thr Arg Asn Gly Asp 
115 120 125 

He Asn Thr Lys Gly Val Val Ser Ala Ala Gly Gin Val Arg Ser Gly 
130 135 140 

Ala Ala Ala Pro He Ala Ala Asn Asp Leu Thr Arg Lys Asp Tyr Val 
145 150 155 160 

Asp Gly Ala He Asn Thr Val Thr Ala Asn Ala Asn Ser Arg Val Leu 
165 170 175 

Arg Ser Gly Asp Thr Met Thr Gly Asn Leu Thr Ala Pro Asn Phe Phe 
180 185 190 

Ser Gin Asn Pro Ala Ser Gin Pro Ser His Val Pro Arg Phe Asp Gin 
195 200 205 

He Val He Lys Asp Ser Val Gin Asp Phe Gly Tyr Tyr 
210 215 220 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1026 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacteriophage T4 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: p37 amino acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Ala Thr Leu Lys Gin He Gin Ph Lys Arg Ser Lys He Ala Gly 
15 10 15 
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Thr Arg Pro Ala Ala Ser Val Leu Ala Glu Gly Glu Leu Ala lie Asn 
20 25 30 

Leu Lys Asp Arg Thr He Phe Thr Lys Asp Asp Ser Gly Asn He He 
35 40 45 

Asp Leu Gly Phe Ala Lys Gly Gly Gin Val Asp Gly Asn Val Thr II 
50 " 55 60 

Asn Gly Leu Leu Arg Leu Asn Gly Asp Tyr Val Gin Thr Gly Gly Met 
65 70 75 80 

Thr Val Asn Gly Pro He Gly Ser Thr Asp Gly Val Thr Gly Lys He 
85 90 95 

Phe Arg Ser Thr Gin Gly Ser Phe Tyr Ala Arg Ala Thr Asn Asp Thr 
100 105 110 

Ser Asn Ala HiB Leu Trp Phe Glu Asn Ala Asp Gly Thr Glu Arg Gly 
115 120 125 

Val He Tyr Ala Arg Pro Gin Thr Thr Thr Asp Gly Glu He Arg Leu 
130 135 140 

Arg Val Arg Gin Gly Thr Gly Ser Thr Ala Asn Ser Glu ' Phe Tyr Phe 
145 150 * 155 160 

Arg Ser He Asn Gly Gly Glu Phe Gin Ala Asn Arg He Leu Ala Ser 
165 170 175 

Asp Ser Leu Val Thr Lys Arg He Ala Val Asp Thr Val He His Asp 
180 185 190 

Ala Lys Ala Phe Gly Gin Tyr Asp Ser His Ser Leu Val Asn Tyr Val 
195 200 205 

Tyr Pro Gly Thr Gly Glu Thr Asn Gly Val Asn Tyr Leu Arg Lys Val 
210 215 220 

Arg Ala Lys Ser Gly Gly Thr He Tyr His Glu He Val Thr Ala Gin 
225 230 235 240 

Thr Gly Leu Ala Asp Glu Val Ser Trp Trp Ser Gly Asp Thr Pro Val 
245 250 255 

Phe Lys Leu Tyr Gly He Arg Asp Asp Gly Arg Met He He Arg Asn 
260 265 270 

Ser Leu Ala Leu Gly Thr Phe Thr Thr Asn Phe Pro Ser Ser Asp Tyr 
275 280 285 

Gly Asn Val Gly Val Met Gly Asp Lys Tyr Leu Val Leu Gly Asp Thr 
290 295 300 

Val Thr Gly Leu Ser Tyr Lys Lys Thr Gly Val Phe Asp Leu Val Gly 
305 310 315 320 

Glv Gly Tyr Ser Val Ala Ser He Thr Pro Asp Ser Phe Arg Ser Thr 
3 _ 325 330 335 

Arg Lys Gly He Phe Gly Arg Ser Glu Asp Gin Gly Ala Thr Trp He 
340 345 350 

Met Pr Gly Thr Asn Ala Ala Leu Leu Ser Val Gin Thr Gin Ala Asp 
355 360 365 

Asn Asn Asn Ala Gly Asp Gly Gin Thr His He Gly Tyr Asn Ala Gly 
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370 



375 



380 



Gly Lys Met Aen Hie Tyr Phe Arg Gly Thr Gly Gin Met Asn lie Asn 
365 390 395 400 

Thr Gin Gin Gly Met Glu lie Asn Pro Gly lie Leu Lys Leu Val Thr 
405 410 * 415 

Gly Ser Asn Asn Val Gin Phe Tyr Ala Asp Gly Thr He Ser Ser lie 
420 425 430 

Gin Pro He Lys Leu Asp Asn Glu He Phe Leu Thr Lys Ser Asn Asn 
435 440 445 

Thr Ala Gly Leu Lys Phe Gly Ala Pro Ser Gin Val Asp Gly Thr Arg 
450 455 460 

Thr He Gin Trp Asn Gly Gly Thr Arg Glu Gly Gin Asn Lys Asn Tyr 
465 470 475 480 

Val He He Lys Ala Trp Gly Asn Ser Phe Asn Ala Thr Gly Asp Arg 
485 490 495 

Ser Arg Glu Thr Val Phe Gin Val Ser Asp Ser Gin Gly Tyr Tyr Phe 
500 505 510 

Tyr Ala His Arg Lys Ala Pro Thr Gly Asp Glu Thr He Gly Arg He 
515 520 525 

Glu Ala Gin Phe Ala Gly Asp Val Tyr Ala Lys Gly He He Ala Asn 
530 535 540 

Gly Asn Phe Arg Val Val Gly Ser Ser Ala Leu Ala Gly Asn Val Thr 
545 550 555 560 

Met Ser Asn Gly Leu Phe Val Gin Gly Gly Ser Ser He Thr Gly Gin 
565 570 575 

Val Lys He Gly Gly Thr Ala Asn Ala Leu Arg He Trp Asn Ala Glu 
580 585 590 

Tyr Gly Ala He Phe Arg Arg Ser Glu Ser Asn Phe Tyr He He Pro 
595 600 605 

Thr Asn Gin Asn Glu Gly Glu Ser Gly Asp He His Ser Ser Leu Arg 
610 " 615 620 

Pro Val Arg He Gly Leu Asn Asp Gly Met Val Gly Leu Gly Arg Asp 
625 630 635 640 

Ser Phe He Val Asp Gin Asn Asn Ala Leu Thr Thr He Asn Ser Asn 
645 650 655 

Ser Arg He Asn Ala Asn Phe Arg Met Gin Leu Gly Gin Ser Ala Tyr 
660 665 670 

He Asp Ala Glu Cys Thr Asp Ala Val Arg Pro Ala Gly Ala Gly Ser 
675 680 685 

Phe Ala Ser Gin Asn Asn Glu Asp Val Arg Ala Pro Phe Tyr Met Asn 
690 695 700 

He Asp Arg Thr Asp Ala Ser Ala Tyr Val Pro lie Leu Lys Gin Arg 
705 710 * 715 720 

Tyr Val Gin Gly Asn Gly Cys Tyr Ser Leu Gly Thr Leu He Asn Asn 



725 



730 



735 
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Gly Aan Phe Arg Val His Tyr His Gly Gly Gly Asp Asn Gly Ser Thr 
740 745 750 

Gly Pro Gin Thr Ala Asp Phe Gly Trp Glu Phe II Lys Asn Gly Asp 
755 760 765 

Phe lie Ser Pro Arg Asp Leu lie Ala Gly Lys Val Arg Phe Asp Arg 
770 775 780 

Thr Gly Asn lie Thr Gly Gly Ser Gly Asn Phe Ala Asn Leu Asn Ser 
785 790 " 795 800 

Thr lie Glu Ser Leu Lys Thr Asp He Met Ser Ser Tyr Pro He Gly 
805 810 815 

Ala Pro He Pro Trp Pro Ser Asp Ser Val Pro Ala Gly Phe Ala Leu 
820 825 830 

Met Glu Gly Gin Thr Phe Asp Lys Ser Ala Tyr Pro Lys Leu Ala Val 
835 840 845 

Ala Tyr Pro Ser Gly Val He Pro Asp Met Arg Gly Gin Thr He Lys 
850 855 860 

Gly Lys Pro Ser Gly Arg Ala Val Leu Ser Ala Glu Ala Asp Gly Val 
865 870 875 880 

Lys Ala His Ser HiB Ser Ala Ser Ala Ser Ser Thr Asp Leu Gly Thr 
865 890 895 

Lys Thr Thr Ser Ser Phe Asp Tyr Gly Thr Lys Gly Thr Asn Ser Thr 
900 905 910 

Gly Gly His Thr His Ser Gly Ser Gly Ser Thr Ser Thr Asn Gly Glu 
915 920 925 

His Ser His Tyr lie Glu Ala Trp Asn Gly Thr Gly Val Gly Gly Asn 
930 935 940 

Lye Met Ser Ser Tyr Ala He Ser Tyr Arg Ala Gly Gly Ser Asn Thr 
945 950 955 960 

Asn Ala Ala Gly Asn His Ser His Thr Phe Ser Phe Gly Thr Ser Ser 
965 970 975 

Ala Gly Asp His Ser His Ser Val Gly He Gly Ala His Thr His Thr 
980 985 990 

Val Ala He Gly Ser His Gly His Thr He Thr Val Asn Ser Thr Gly 
995 1000 1005 

Asn Thr Glu Asn Thr Val Lys Asn He Ala Phe Asn Tyr He Val Arg 
1010 1015 1020 

Leu Ala 
1025 
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What is claims j p? 

1. An isolated polyp ptide consisting essentially 
of the gp37 tail fiber protein of bacteriophage T4 lacking 
5 amino acids 99-496 (SEQ ID NO: 6) when numbered from the amino 
terminus, wherein said polypeptide has the capability to form 
diners and interact with the P36 protein oligomer of 
bacteriophage T4. 

10 2. An isolated polypeptide consisting essentially 

of a fusion protein between the gp36 and gp37 proteins of 
bacteriophage T4, wherein amino acid residues 1-242 of gp37 
(SEQ ID NO: 6) are fused in proper reading frame to amino acid 
residues 118-221 of gp36 (SEQ ID NO:5). 

15 

3. The polypeptide of claim 2 wherein a plurality 
of carboxy termini of said polypeptide have the capability of 
interacting with the amino terminus of the P37 protein 
oligomer of bacteriophage T4 and to form an attached oligomer 

20 and the amino termini of the oligomer of said polypeptide 
have the capability of interacting with the carboxy termini 
of gp36 polypeptides of bacteriophage T4. 

4. An isolated polypeptide oligomer consisting 
25 essentially of two gp37 polypeptides of bacteriophage T4, 

wherein the amino termini of said oligomer lack the 
capability of interacting with the carboxy termini of gp36 
polypeptides of bacteriophage T4. 

30 5. An isolated polypeptide oligomer consisting 

essentially of the P37 protein of bacteriophage T4, wherein 
the amino termini of said oligomer lack the capability of 
interacting with the carboxy termini of gp36 polypeptides of 
bacteriophage T4. 

35 

6. An isolated polypeptide consisting essentially 
of a variant of the gp36 protein of bacteriophage T4, wherein 

- 54 - 



BNS00CID: <WO 961 1947A1_I_> 



WO 96/11947 



PCT/US95/13023 



said polypeptid lacks the capability of interacting with the 
amino t rminus of the P37 protein oligomer of bacteriophage 
T4. 

5 7. An isolated polypeptide consisting essentially 

of a fusion protein between the gp36 and gp34 proteins of 
bacteriophage T4, wherein amino acid residues 1-73 of gp3 6 
(SEQ ID NO: 5) are fused in proper reading frame 
amino-terminal to amino acid residues 866-1289 of gp34 (SEQ 
10 ID NO:2) . 

8. An oligomer of the polypeptide of claim 7, 
wherein the amino termini of said dimer have the capability 
of interacting with the gp35 protein of bacteriophage T4. 

9. An isolated polypeptide consisting essentially 
of a variant of the gp35 protein of bacteriophage T4, wherein 
said polypeptide forms an angle of less than about 125° when 
combined with the P34 and P36-P37 protein oligomers of 
bacteriophage T4 f under conditions wherein the wild-type gp35 
protein forms an angle of 137° when combined with said 
oligomers, 

10. An isolated polypeptide consisting essentially 
25 of a variant of the gp3 5 protein of bacteriophage T4, wherein 

said polypeptide forms an angle of more than about 145° when 
combined with the P34 and P36-P37 protein oligomers of 
bacteriophage T4, under conditions wherein the wild-type gp35 
protein forms an angle of 137° when combined with said 
30 oligomers. 

11. An isolated polypeptide consisting essentially 
of a variant of the gp35 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the P34 protein 

35 oligomer of bacteriophage T4 is unstable at temperatures 
b tween about 40°C and about 60°C. 
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12. An isolated polyp ptide oligomer consisting 
ssentially of a variant of the P37 protein of bact riophage 

T4, wh rein the int raction of said oligomer with the P36 
protein oligomer of bacteriophage T4 is unstable at 
5 temperatures between about 40°c and about 60 °C. 

13. An isolated polypeptide oligomer consisting 
essentially of a variant of the P37 protein of bacteriophage 
T4, wherein the car boxy- terminal domain of said oligomer is 

10 modified so as to confer the ability of the entire 

polypeptide to bind specifically to an immobilized ligand. 

14. The polypeptide of claim 13, wherein said 
ligand is selected from the group consisting of biotin, 

15 immunoglobulin, or divalent metal ions. 

15. A nanostructure comprising a plurality of 
fusion proteins, said fusion proteins comprising a first 
portion consisting of at least the first 10 N-terminal amino 

20 acids of a tail fiber protein fused via a peptide bond to a 
second portion consisting of at least the last 10 C-terminal 
amino acids of a second tail fiber protein, wherein the tail 
fiber proteins are selected from the group consisting of 
gp34, gp35, gp36, and gp37 proteins of a T-even-like 

25 bacteriophage, wherein the first and second tail fiber 
proteins are the same or different. 

16. The nanostructure of claim 15, wherein the 
first and second tail fiber proteins are different. 

30 

17. The nanostructure of claim 15, which further 
comprises a molecule that can self-assemble into a dimer or 
trimer, fused to at least a 10 amino acid portion of a 
T-even-like tail fiber protein. 

35 

18. The nanostructure of claim 17, wherein the 
m lecule has the structure of a leucin zipper. 
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19. The nanostructure of claim 15, wh rein said 
nanostructure comprises a linear one-dimensional rod, 

20. The nanostructure of claim 15, wherein said 
5 nanostructure comprises a polygon. 

21. The nanostructure of claim 15, wherein said 
nanostructure comprises a three-dimensional cage or solid. 

10 22. The nanostructure of claim 15, wherein said 

nanostructure comprises a two-dimensional open or closed 
sheet . 

23. An isolated fusion protein consisting 

15 essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 10-60 
N-terminal amino acids of the gp37 protein fused to a second 
portion of a gp36 protein of a T-even-like bacteriophage 
consisting of at least the last 10-60 C-terminal amino acids 

20 of the gp36 protein. 

24. An isolated fusion protein consisting 
essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 10 N-terminal 

25 amino acids of the gp37 protein fused to a second portion of 
a gp36 protein of a T-even-like bacteriophage consisting of 
at least the last 10 C-terminal amino acids of the gp36 
protein. 

30 25. An isolated fusion protein consisting 

essentially of a portion of a gp37 protein of a T-even-like 
bacteriophage consisting of at least the first 20 N-terminal 
amino acids of the gp37 protein fused to a second portion of 
a gp36 protein of a T-even-like bacteriophage consisting of 

35 at least the last 20 C-terminal amino acids of the gp3 6 
prot in. 
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26. An isolated fusion prot in consisting 

ess ntially of a portion of a gp36 protein of a T-ev n-like 
bact riophage consisting of at least the first 10-60 
N-terminal amino acids of the gp36 protein fused to a second 
5 portion of a gp34 protein of a T-even-like bacteriophage 
consisting of at least the last 10-60 C-terminal amino acids 
of the gp34 protein, 

27. An isolated protein comprising at least 20 

10 contiguous amino acids of the gp37, gp36, or gp34 protein of 
a T-even-like bacteriophage, and lacking at least 5 amino 
acids of the amino- or carboxy-terminus of the protein. 



15 claim 1. 



claim 2, 



28. An isolated DNA encoding the polypeptide of 



29. An isolated DNA encoding the polypeptide of 



20 30. An isolated DNA encoding the polypeptide of 

claim 4. 



claim 5. 

25 

claim 6. 
30 claim 7. 
claim 9. 

35 

claim 10 < 



31. An isolated DNA encoding the polypeptide of 



32. An isolated DNA encoding the polypeptide of 



33. An isolated DNA encoding the polypeptide of 



34. An isolated DNA encoding the polypeptide of 



35. An isolated DNA encoding the polypeptid of 
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36. An isolated DNA encoding the polypeptide of 

claim 11. 

37. An isolated DNA encoding the polypeptide of 

5 claim 12. 

38. An isolated DNA encoding the polypeptide of 

claim 13. 

10 39. An isolated DNA encoding the protein of claim 

23. 

40. An isolated DNA encoding the protein of claim 

25. 

15 

41. An isolated DNA encoding the protein of claim 

26. 

42. An isolated DNA encoding the protein of claim 

20 27. 

43. A method for making a polygonal nanostructure 
comprising contacting the protein of claim 26 with purified 
gp35 proteins of a T-even-like bacteriophage. 

25 

44. A method for making a nanostructure comprising 
contacting a plurality of the proteins of claim 2 3 with each 
other . 

30 45. A kit comprising in one or more containers the 

fusion protein of claim 23. 

46. A kit comprising in one or more containers the 
-fusion protein of claim 25. 

35 

47. A kit comprising in on or more containers the 
fusion protein of claim 26. 
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48. A kit comprising in one or more containers the 
fusion protein of claim 26 , and an isolated gp35 protein of a 
T- v n-lik bact riophag . 

5 49. The protein of claim 23 wherein the T-even- 

like bacteriophage is T4. 

50. The protein of claim 26 wherein the T-even- 
like bacteriophage is T4. 

10 

51. An isolated polypeptide consisting essentially 
of a variant of the gp36 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the P37 protein 
oligomer of bacteriophage T4 is unstable at temperatures 

15 between about 40°C and about 60 °C. 

52. An isolated polypeptide consisting essentially 
of a variant of the gp3 6 protein of bacteriophage T4, wherein 
the interaction of said polypeptide with the gp35 protein of 

20 bacteriophage T4 is unstable at temperatures between about 
40 P C and about 60°C. 

53 . An isolated polypeptide consisting essentially 
of a variant of the gp34 protein of bacteriophage T4, wherein 

25 the interaction of said polypeptide with the gp35 protein of 
bacteriophage T4 is unstable at temperatures between about 
40°C and about 60°C. 



30 



35 
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T4 Genes 34-37 seq -> List 

DNA sequence 8855 b.p. TAGGAGCCCGGG ... CGGCCCTTCTAA linear 

Gene34 :bpl6-388S; Or f X : b^2 894 -4091 ; Gene35 : bp4 1 2 7 - SO 1 4 ; Cene36 : bp5077-5742 ; Gene 37 : bp57 5 1 - 883 1 . 



1 

61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 
781 
B41 
901 
961 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 
1921 
1961 
2041 
2101 
2161 
2221 
2281 
234 1 
2401 
2461 
2521 
25B1 
2641 
2701 
2761 
2821 
2881 
2941 
3001 
3061 
3121 
3181 
3241 
3301 
3361 
3421 
3481 
3541 
3601 
3661 
3721 
3781 
3841 



I 10 
TAGGAGCCCG 
GCAGGTGCTG 
GTTAACGTTG 
TATTTAAAAG 
CCAAAACCAG 
T3GATTACGG 
ACCGCAGCTG 
ATCGTTCTCC 
GTACAAAGTA 
AAGTCACAGC 
ACTAGAGAAG 
ATCGTACGTA 
CA7CGCGATA 
GTTACTACAT 
CGTACATCGA 
GACGGGGATA 
GAAGAAGTTA 
CCAACTAATA 
CAAACAGTTA 
CAATTCCCAA 
GTTTTTAACG 
GATGGAAAAT 
AATGATTCTA 
GTCGATTTAG 
CGTACTGCTA 
CAGAACACCA 
AGAACTGCTA 
GCAGGAACCG 
GAATCATTAT 
CGTGAATTAA 
AAAGCTTTGG 
GAAAGTGAAG 
GAAACCTTAC 
CAAAGTGAAG 
GACCGTAGAG 
GACCCAGGCG 
AGTACTGATC 
GACCATTATA 
GCTACGCAGG 
CTTTTAGGTA 
GAAACTGTGA 
GCGCAGAGTG 
TCTGGTTCAA 
TATGAGAAAA 
TTGCCACTAA 
TTCATTCGTA 
AATCTGAGTG 
AATAGAACAT 
CCTGCATCCG 
GGCGGCGGTA 
CACTTTTATT 
ATGCCAATAA 
CGTTCAGTTA 
AACGGTGATT 
CCAGCCGCTG 
TCCGGTCAGA 
GGCGGTTTAA 
ACCCGTGCGC 
ACTTATAACC 
CTTCCATACT 
AACACACTTG 
ACCACTCGCT 
GTATTTGACG 
GCTACAATGG 
CCTGACCCAG 



! 20 
GGAGAATGGC 
ATAAAATAAT 
ATTACTTAAT 
AT1TTGTAAT 
CAGGAGCTTT 
TTTCATCTGG 
GAAATGACAT 
AAGATATTGG 
TTGTAAACTT 
TAGTTTTAAT 
CTATAGTTGT 
GATTTACTTC 
TTATTAATTT 
ACGATGAAAC 
TTGACGGTTT 
GTAAAGCGCG 
TGGTATTTGG 
TTTCTGTTGC 
AAATCAAAGC 
AACGCTCAGA 
ATGAAACTAA 
ATTGGGTTGT 
CTAGAGCAAG 
AAAATTCTCC 
CAGAAACTCG 
CATTCTCTTT 
CAGAAACTCG 
ATGATACTAC 
CTGGTATTGT 
ATCCTACGAA 
ATCAGTATAA 
TAATTGCTGG 
ATAAAAAGAC 
TTAATACAGG 
CAACTGAAAG 
TCGACGATAC 
GTACTTCTGT 
CACTTAATAT 
TCGAAGCTG C 
CTAAATCTAC 
CTCGAACGTC 
AACCTACTTG 
TTACATTCGT 
ATAGCTATGC 
AAGCAAAAGC 
GGGATATTGC 
CCCCTCTTGT 
TTACCATCCG 
GGGCAAATCC 
GTGATACGAC 
CTCAACGTAA 
ACATTAATGC 
CAGCCAATGG 
ACGGATTCTT 
ATCAGACTGG 
TTACAATTGG 
CTGTTAACTC 
CAACATCTGA 
AGTTCCCGGG 
TAGAACGTGG 
ATTCGCTTTA 
GGACACGTAC 
GAGGTAACCC 
GGAATCTTAC 
TGAATAAAAC 



I 30 
CGAGA TTAAA 
CAACGTAGCT 
TCAAGAAAAC 
CATTTATGAT 
TAATAGCGGA 
TTCATATCAA 
CACGTTTACT 
AGGAAAACCT 
TAGAGGTGAA 
TTTTAGTAAT 
AACACCAGCG 
TGCTGCACCA 
CGTOGATTTA 
GACTTCAGTA 
CTTGATGTTT 
TTTACGTATC 
TGCGAATAAC 
TGATACTGTT 
TGCTGATGAA 
ATATCCACCT 
TTATGTTCCA 
ACAGCAAAAC 
ATTAGGCGTA 
ACAAAAAGAA 
CAGAGGTATT 
TGCTGATGAT 
TAGAGGTGTC 
AATCATCACT 
AACCTTTGTA 
TGTTTATAAT 
AGCTACTCCA 
ACAAAGTCAG 
ATC AA CTGA T 
AACTGATTAT 
TTTAAGTGGT 
TCGTATCTCT 
TCTTGCTCTA 
TCTTGAAGCA 
TGCGGGAACA 
TCAAGCCCAA 
AGCAAATACT 
GGCAGCTACT 
TGGTAATGAT 
GGTATCACCA 
TGCTGATACA 
ACAGACGGTT 
ATCATCTAGT 
TAATACAGGA 
TGCACAGTCA 
CCGTTCGACA 
TAAAGACGGT 



TGAATTCATC 
TATTCGTAAT 
TGGTTTTAAT 
TGAAGGCTTA 
GAGAATTCGT 
TACTGTAGGA 
TTATTTTAAA 
CGAAGAAGTT 
CCAAGATTGG 
A TGGCAG AAA 
TCCTCAACCA 
TATTCGTGAT 
GGTTAAATTT 



I 40 
ACAGAATTCA 
TTAGCTGATC 
ACAGTTCAAC 
AACCGCTTTT 
CGCTGGAGAG 
TTAAAATCTG 
TTACCATCTT 
GGAGTTAACC 
CAGGTACGTT 
CGTCTGTGGC 
AATACTTATC 
ATTAATGTCA 
G A TAAACTAA 
CAAGAAGTTG 
GATGATAATG 
ATAACGACTA 
GGAACAACTC 
AAAATTTCCA 
GATAAAATTG 
GAAGCTGAAT 
GTTTTGGAGC 
GTTCCAACTG 
ATTGCTTTAG 
TTAGCAATTA 
GCAAGAATAG 
ATTATCATCA 
GCAGAAATTC 
CCTAAAAAGC 
TCTACTGCAG 
AAAAACACTG 
ACACAGCAAG 
CAAGGATGGC 
GGAAGAATTG 
ACTCCTGCAG 
ATAGCTGAAA 
ACACCATTAA 
TCTGGATTAG 
AATGAGACAC 
TTAGATAATG 
GAGGGTGTTA 
GCTGTATCTC 
ACTGCAATAA 
ACAGTCGGTT 
TATGAATTAA 
AATTTATTGG 
AATGGTTCAC 
ACTGGTGAAT 
GCCCCGACTA 
ATGAGTATTC 
GTGTTTGAAG 
AATATAGCGT 
ATGAATGTGA 
AGCAAGTCTG 
GATGCCTCTA 
GGATTACGCC 
ATCATTGCCA 
TCTCAGGGTA 
TTCTGGTCAA 
ATGGTTCAAA 
AAATCTCCTG 
ATTACTTATC 
ACCAAAAACT 
TCTGATATCG 
TTCTTGCGAA 
GAATGGGTTG 



I 50 
GAGCAGAAGA 
GTACCCTAGG 
AGTATGATCC 
GGGCTGCTAT 
CATTACGTAC 
GTCAAGCAAT 
CTCCAATTGA 
AAGTTTTAAT 
CAGTACTAAT 
AAATGTATGT 
AAGCGCAATC 
AACTTCCAAG 
ATC05CT7TA 
GAACTCATTC 
AGAAATTATG 
ATTCAAACAT 
AAACAATTGA 
TGAATTACAT 
CTTCTTCAGT 
GGGTTACAGT 
TTGCrTACAT 
TAGAAAGAGT 
CTACACAAGC 
CTCCAGAAAC 
CAACTACTGC 
CTCCTAAAAA 
CTACGCAGCA 
TTCAAGCTCG 
GTGCTACTCC 
ATAATTTAGT 
GTGCAGTAAT 
CAAATGCTCT 
GTTTAATTGA 
TCACTCCTAA 
TTGCTACACA 
AAATTAAAAC 
TTGAATCAGG 
AACGTGGTAC 
TTTTAATAAC 
TTAAAGTTGC 
CAAAAAATTT 
GAGGTTTTGT 
CTACCCAAGA 
ACCGTGTATT 
ATGGTCTAGA 
TAACCTTAAC 
TTGGTGGTTC 
GTATCCTT7T 
GTGTATGGGG 
TTGGCGATGA 
TTAACATTAA 
ATGGCACTGC 
CAAATGCTTT 
ATACCTATTT 
CATTATTAAT 
AAGGTGTTAC 
CTAAAACATC 
TCGATATTAA 
AAACTAATGA 
GTACACTCAC 
CAACGACGCC 
CTTCGTCAAG 
GTGCTTTACC 
TTGGTAATGT 
AATAAGAGGT 



I 60 
TGGTCTGGAC 
AA CTGA CGGT 
AACTCGTGCA 
AAATGATATT 
CGATGCTAAC 
TTCGGTTAAC 
TGGTGATACT 
TGTAGCTCCA 
GACTCATCCA 
TGCTGATTAT 
CAACGATTTT 
ATTTGCTAAT 
TCATACAATT 
CATTGAAGGC 
GAGACTGTTT 
TCGTCCAAAT 
GCTTAAGCTT 
GAGAAAAGGA 
TCAATTGCTG 
TCAAGAATTA 
AGAAGATTCT 
AGATTCTTTA 
TCAAGCTAAT 
GTTAGCTAAT 
TCAAGTGAAT 
GCTGAATGAA 
AGAAACTAAT 
TCAAGGTTCT 
AGCTTCTAGC 
TGTTTCACC? 
TTTAGCAGTT 
TGTAACGCCA 
AATTGCTACG 
AACTTTAAAT 
AGTTGAATTC 
CAGATTTAAT 
AACTCTCTGG 
ACTTCGTGTA 
TCCTAAAAAG 
AACTCAGTCT 
AAAATGGATT 
TAAAACTTCA 
TTTAGAACTG 
AGCAAATTA7 
TTCATCTCAG 
CCAACAAACG 
ATTGGCCGCT 
CGAAAAAGCT 
TAACCAATTT 
CACATCTCAT 
TGGTACTGTA 
AACATTCGGT 
TAGAGCAATA 
TTTGCTCACT 
TAATAATCAA 
TATAAATTCA 
TGATTTATAT 
TGATTCAGCC 
AGTGACTGGG 
TCAGTTTGGT 
AGAAGCGCGT 
TTTTGTTCAG 
ATCTGATAAT 
TCGCATTGTT 
ATTATGGAAA 



60 

120 

160 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 
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3660 

3720 

3780 

3840 
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3901 
3961 
4021 
4061 
4141 
4201 
4261 
4321 
4381 
4441 
4*01 
4561 
4621 
4681 
4 741 
4801 
4861 
4921 
4981 
5041 
5101 
5161 
5221 
5281 
5341 
5401 
54fl 
6521 
5581 
564 1 
5701 
5761 
5821 
5881 
5941 
6001 
6061 
6121 
6181 
6241 
6301 
6361 
6421 
6481 
6541 
6601 
6661 
6721 
6781 
6841 
6901 
6961 
7021 
7081 
7141 
7201 
7261 
7321 
7381 
7441 
7501 
7561 
7621 
7681 
7741 
7801 
7861 
7921 
7981 
8041 
8101 
8161 
8221 
8281 



AATTTATGGC 
CAGTAAGATA 
ATGTTAAATT 
AGACTTTTTG 
TCAAATGATA 
AATTGTTCCT 
A^GAACCGCC 
ATATGCTCCT 
TAATAGAAAA 
AGATGTAGGA 
GGCAGTTGCT 
ATTATCTGAT 
TGATGCTCAT 
TAACGGATAT 
AAAGGAAGAT 
AACTGCACAA 
TGGCGGCATT 
CGAATCCGCT 
TAAAGTGTTT 
TTCTTTATAA 
ACAACTGGAG 
GATGTACTCT 
AACGATTTCG 
GGCATTCAAG 
GGTGATGGTG 
TTTAAATCGT 
ATTAACACAA 
ATAGCAGCGA 
GCAAATGCAA 
CCAAACTTTT 
ATCGTAATTA 
TAAAACAAAT 
TAGCCGAAGG 
CAGGAAATAT 
TTAACGGACT 
GACCCATTGG 
TTTATGCAAG 
GCACTGAACG 
TTAGGGTTAG 
ATGGAGGCGA 
TTGCGGTTGA 
TGGTTAATTA 
TTCGCGCTAA 
CTGATGAAGT 
ACGATGGCAG 
CGTCTAGTGA 
CTGTAACTGC 
CTGTTGCTTC 
CTGAGGACCA 
AAACACAAGC 
GCGGTAAAAT 
GTATGGAAAT 
ACCCTGACGC 
CTAAATCTAA 
GGACTATCCA 
AAGCATGGGG 
TATCAGATAG 
CTATTGGACC 
ACGGAAATTT 

ACGCACTGAG 
TTTATATTAT 
GACCTGTGAG 
TAGATCAAAA 
GAATGCAATT 
CGGGTGCAGG 
ATATTGATAG 
GCAATGGCTG 
ATCCCGCCGG 
TTAAAAACGG 
GAACTGGTAA 
CACTTAAAAC 
ATTCAGTTCC 
CAAACTTAGC 



CGAGATTTGG 
TAAAATAAGT 
TCAGGATAAT 
ACCCTTCCAC 
CTACATCAGC 
ATATTAACTA 
GGAACGTCTY; 
TTTTATACTT 
AGCACAGATG 
GCTACCGGGT 
GGAACTAATA 
TATAATTTAA 
TTACTTGCTC 
ATTTCTTCAA 
TATGTAGTTG 
GCTGGCCAAG 
TCGAAACCTG 
TCACCTCCGG 
GGGCAAGAAT 
ATACTATTCA 
GCTCTGTCAT 
ATAAATCATT 
TTTCTAAAGC 
TCCCATATGC 
CTACTTTTGA 
CATTTGGTTC 
AAGGTGTTGT 
ATGACCTTAC 
ACTCTAGGGT 
TCTCGCAGAA 
AGGATTCTGT 
ACAATTTAAA 
TGAATTGGCT 
CATCGATCTA 
TTTGAGATTA 
TTCTACTGAT 
AGCAACAAAC 
TGGCGTTATA 
ACAAGGAACA 
ATTTCAGGCT 
TACCGTTATT 
TCTTTATCCT 
GTCCGGTGGT 
TTCTTGGTGG 
AATCATTATC 
TTATGGCAAC 
CTTGTCATAC 
TATTACTCCT 
AGGCGCAACT 
TGATAATAAC 
GAACCACTAT 
TAACCCGGGT 
AACTATTTCT 
TAATACTGCG 
ATGGAACGGT 
TAACTCATTT 
TCAAGGATAT 
TATTGAAGCT 
TAGAGTTGTT 
CCAAGGTGGT 
AATTTGGAAC 
TCCAACCAAT 
AATAGGATTA 
TAATGCTITA 
GGGGCAGTCG 
TTCATTTGCT 
AACTGATGCT 
CTATTCATTA 
AGATAACOGT 
TGATTTTATT 
TATCACTGGT 
TGATATCATG 
TGCTGGATTT 
TGTTGCATAT 



acaacgatat 
atagcggctt 
cctgtaggaa 
cggagcatta 

TGCTTTTGTT 
CTGGAAAGGT 
CCTTTCCATC 
CTTCTAAAAG 
ATTATCAAAC 
TTCCAAGAAG 
ACGAAATTCC 
TTCCTGGAGA 
TTGGAACTAC 
CACAAGCTGA 
TTCCAGAAAA 
GTGGCATGAG 
CTGAATTTGG 
ATATAATGGT 
TTAGAGAAGT 
AATAAAGGGG 
TTGGCATCAA 
TAAAATATAT 
TAATGGTGGT 
TCCAAACATC 
TAAAGCAAAT 
AACAGGCCGA 
GTCGGCAGCT 
TAGAAAGGAC 
GCTACGGTCT 
TCCTGCATCT 
TCAAGATTTC 
AGAAGCAAAA 
ATAAACTTAA 
GGTTTTGCTA 
AATGGCGATT 
GGCGTCACTC 
GATACTTCAA 
TATGCTCGCC 
GGAAGCACTC 
AACCGTATTT 
CATGATGCCA 
GG AA CCGCTG 
ACAATTTATC 
TCTGGTGATA 
CCTAATAGCC 
GTCGGTGTAA 
AAAAAAACTG 
GACAGTTTCC 
TGGATAATGC 
AATGCTGGAG 
TTCCGTCGTA 
ATTTTGAAAT 
TCCATTCAAC 
GGTCTTAAAT 
GGTACTCGCG 
AATGCCACTG 
TATTTTTATG 
CAATTTGCTG 
GGGTCAAGCG 
TCTTCTATTA 
GCTGAATATG 
CAAAATGAAG 
AACGATGGCA 
ACTACGATAA 
GCATACATTG 
TCCCAGAATA 
AGTGCATATG 
GGGACTTTAA 
TCTACACCTC 
TCACCTCGCG 
GGTTCTGGTA 
TCGAGTTACC 
CCTTTGATGG 
CCTAGCGGTG 



GTCCAAACGC 
CTTGCCCGCT 
GTCAAACATT 
GTTCATAGTA 
AGTTTTCATG 
TAATTTTCCT 
TGATTCTATA 
AGCTATCGCA 
TATTTTAGAT 
AACGTATGAA 
GACATTGCCA 
TGTTCTTTAT 
AAATATATCT 
ATTTACTGGG 
CGCAGTAGGA 
AAATTTAAGC 
CGTCAATGGT 
ACTTCCTACG 
TTAAATTCAG 
CATACAATGG 
GGAAATTTTC 
TCAGAATATA 
ACTTATGCAT 
ATGAGCCCAT 
ATCGATATTG 
ACTGTTGTAA 
CGTCAAGTAA 
TATGTTGATG 
GGTGACACCA 
CAACCCTCAC 
GGCTATTATT 
TCGCAGGAAC 
AAGATAGAAC 
AAGGCGGGCA 
ATGTACAAAC 
CAAAAATTTT 
ATGCCCATTT 
CTCAAACTA C 
CCAACAGTGA 
TAGCATCAGA 
AAGCATTTGG 
AAACAAAT3G 
ATGAAATTCT 
CACCAGTATT 
TTGCATTAGG 
TGGGCGATAA 
GTGTATTTGA 
GTAGTACTCG 
CTCGTACAAA 
ACGGACAAAC 
CAGGTCACAT 
TGGTAACTCG 
CTATTAAATT 
TTGGAGCTCC 
AAGGACAGAA 
GTGATAGATC 
CTCATCGTAA 
GGGATGTTTA 
CTTTAGCCGG 
CTGGACAACT 
GTGCTATTTT 
GAGAAAGTGG 
TGGTTGGCTT 
ACAGTAACTC 
ATGCAGAATG 
ATCAAGACCT 
TTCCTATTTT 
TTAATAATGG 
CACAGACTGC 
ATTTAATAGC 
ATTTTGCTAA 
CAATTGCTCC 
AAGGTCAGAC 
TTATTCCACA 



CATTTTATCG 
TTCTACAGCA 
TAGGCGCAGG 
ACTCATATCC 
AATTCTTTGA 
CCTGAAGTAG 
TTCTCAAGAT 
TTAGACCATG 
GTTGTATTTG 
AGTGTTGAGC 
ACTTCAGCTG 
CTTAAAGCTC 
ATCCGTTTTT 
CAAGCTGGGT 
TTTACGATAT 
TTTTCTGAAG 
ATTCGTGTTA 
CAAGCATCGT 
GGACrCCTTCG 
CTGATTTAAA 
CATTGAATCC 
ACAAACCACA 
CAAAGGTAAC 
GCGGGATTTA 
TTTCATGGTA 
TTAATACACG 
GAAGTGGTCC 
GAGCAATAAA 
TGACAGGTAA 
ACGTTCCACG 
AAGAGCACTT 
ACGTCCTGCT 
AATTTTTACT 
AGTTGATGGC 
AGCTGGAATG 
CAGATCTACA 
ATGGTTTGAA 
AACTGACGGT 
ATTCTATTTC 
TTCGTTAGTA 
ACAATATGAT 
TGTAAACTAT 
TACTGCACAA 
TAAACTATAC 
TACATTCACT 
GTATCT7X5TT 
TCTAGTTGGC 
TAAAGGTATA 
TGCTGCTCTC 
C C AT ATCGGC 
GAATATCAAT 
CTCTAATAAT 
AGATAACGAG 
TAGCCAAGTT 
TAAAAACTAT 
TCGCGAAACG 
AGCTCCAACC 
TGCTAAAGGT 
CAATGTTACT 
TAAAATTGGC 
CCGTCGTTCG 
AGACATTCAC 
AGGAAGAGAT 
TCGCATTAAT 
TACTGATGCT 
CCGTGCCCCG 
GAAACAACGT 
TAATTTCCGA 
TGATTTTGGA 
AGGCAAAGTC 
CTTAAACACT 
TCCCATTCCT 
CTTTCATAAG 
TATGCGCGGG 



GAAAGTAATT 3960 
GGACCATCAT 4 020 
CCTTCATTTA 4080 
TTTTTCGACT 4140 
CGAATAATCC 4200 
TATCTTGGTT 42 60 
TTGACGTATC 4320 
TTAAACTGAG 4360 
ACAGTTTAGA 44 40 
AATTCATGTC 4500 
CTATAAGTAA 4560 
AGTTATATGC 4620 
ATAATGCATC 4680 
CATCGGAATT 4740 
ACGCACAGAG 4800 
TATCAAGAAA 4860 
ATTATATCTG 492C 
CTAAAACTGG 4980 
GGTTCCCTTT 5040 
AGTAGGTTCA 5100 
AGCCGGTGAC 5160 
AGCTGCTGAT 5220 
ATTTAACGCT 5280 
TGGGGGTAAC- 5340 
TGGCGTAGGA 5400 
CAATGGTGAT 5460 
GGCTGCTCCT 5520 
TACTGTTACT 5580 
TTTAACAGCG 5640 
ATTTGACCAA 5700 
ATGGCTACTT 57 60 
GCTTCAGTAT 5820 
AAAGATGA1T 5880 
AACCTTACTA 5940 
ACTGTAAACG 6000 
CAGGGTTCAT 6060 
AATGCCGATG 6120 
GAAATACGCC 6180 
CGCTCTATAA 624 0 
ACAAAACGCA 6300 
TCTCACTCTT 6360 
CTTCCTAAAG 6420 
ACAGGCCTGG 64 80 
GGTATTCGTG 6540 
ACAAATTTCC 6600 
CTCGGCGACA 6660 
GGTGGATATT 6720 
TTTGGTCGTT 6780 
TTGTCTGTTC 6840 
TACAATGCTG 6900 
ACCCAACAAG 6960 
GTACAATTTT 7020 
ATATTTTTAA 7080 
GATGGCACAA 7140 
GTGATTATTA 7200 
GTTTTCCAAG 7260 
CCCCACCAAA 7320 
ATTATTGCCA 7 3 BO 
ATGTCTAACG 7440 
GGAACAGCAA 7500 
GAAAGTAACT 7560 
AGCTCTTTGA 7620 
TCTTTTATAC 7680 
GCCAACTTTA 7740 
GTTCGCCCGG 7800 
TTCTATATGA 7860 
TATGTTCAAG 7920 
GTTCATTACC 7980 
TGGGAATTTA 8040 
AGATTTGATA 8100 
ACAATTGAAT 8160 
TGGCCGAGTG 8220 
TCCCCATATC 8280 
CAAACTATCA 8340 
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8341 AOOOTAAACC AAOTOOTCOT GCT8TTXTBA 
8401 CCCATAGTCC ATCOGCTTCA XGTACTOACT 
8461 ATQGTACGAA OCX3AACTAAC ACTACOOGTG 
8521 GCACAAATGO TOAGCACAGC CACTACATCG 
6581 ATAAGATGTC ATCATATGCC AT ATCAT ACA 
6641 OGAACCACAO TCACACTTTC TCTTTTOOGA 
6701 TAOOTATTOG TOCTCATACC CACAC3GTAG 
8761 TAAATAGTAC AOOTAATACA GAAAACACOG 
8821 OTTTAOCATA AOGAQAOOOO CTTOOGCCCT 
I 10 I 20 I 30 



GCGCTCAOGC AGATOGTSTT AA QGCTC ATA 8400 
TAGGTACTAA AACCACATCA AGCTTTSACT 6460 
GACACACTCA CTCTCGTAGT GGTTCTACTA 8520 
AOGCATOGAA TGGTACTGGT GTAOGTOGTA 8580 
GGOCOOCTOG GAGTAACACT AATOCAOCAQ 8640 
CTAQCAGTGC TflGCGACCAT TCCCACTCTO 8700 
CAATTOGATC ACATOGTCAT ACTATCACTG 6760 
TTAAAAACAT TGCTTTTAAC TATATCCTTC 6820 

TCTAA * 855 
| 40 I 50 I 60 
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DNA sequence 8855 b.p. TAGGAGCCCGGG ... COCCCCTTCTAA linear 

Gene34:bpl6-3885; Orf X:bp3894-4091; Gene35 : bp4 127-S014 ; Cene36»bp$077-S742; Oene 37tbp5751-8831. 

1 TAGGAGCCCGGGAGA ATG GCC GAC ATT AAA AG A GAA TTC AG A CCA GAA GAT OCT CTG GAC GCA €3 
1 MAEIKREFRAEDGLDA16 

64 GOT GGT GAT AAA ATA ATC AAC GTA GCT TTA GCT GAT CCT ACC OTA GGA ACT GAC GGT GTT 123 

17GCDKIINVALADRTVCTDGV 36 

124 AAC GTT GAT TAC TTA ATT CAA GAA AAC ACA GTT CAA CAG TAT GAT CCA ACT COT GGA TAT 183 

37HVDYLIQENTVQQYDPTRCY 56 

1B4 TTA AAA GAT TTT GTA ATC ATT TAT GAT AAC CGC TTT TGG GCT GCT ATA AAT OAT ATT CCA 243 

57LKDFVI IYDNRFWAAINDIP 76 

244 AAA CCA GCA GGA GCT TTT AAT AGC GGA CGC TGG AGA GCA TTA COT ACC OAT OCT AAC TGG 303 

77KPACAFNSGRWRALRTDAKW 96 

304 ATT ACG GTT TCA TCT GGT TCA TAT CAA TTA AAA TCT GGT" GAA CCA ATT TOG GTT AAC ACC 363 

97 I T V S SG S YQ L K SG E A I S VNT 116 

364 GCA GCT GGA AAT GAC ATC ACG TTT ACT TTA CCA TCT TCT CCA ATT GAT GOT GAT ACT ATC 423 

117AAGNDITFTLPSSPIDGDTI 136 

424 GTT CTC CAA GAT ATT GGA GGA AAA CCT GGA GTT AAC CAA GTT TTA ATT OTA OCT CCA GTA 483 

137VLQDIGGKPGVN0VLIVAPV 156 

484 CAA AGT ATT GTA AAC TTT AGA OCT GAA CAG GTA CCT TCA GTA CTA ATG ACT CAT CCA AAC 543 

1S7QSIVNFRGEQVRSVLMTHPX 176 

54 4 TCA CAG CTA GTT TTA ATT TTT AGT AAT CGT CTG TGG CAA ATG TAT GTT OCT GAT TAT AGT 603 

177SQLVLIFSNRLWQMYVADYS 196 

604 AGA GAA GCT ATA GTT GTA ACA CCA GOG AAT ACT TAT CAA GOG CAA TCC AAC OAT TTT ATC 663 

197 REA IVVTPANTYQAQSKDFI 216 

664 GTA CGT AGA TTT ACT TCT GCT GCA CCA ATT AAT CTC AAA CTT CCA AGA TTT GCT AAT CAT 723 

217 V R R FTSAAP I NVKLPRFANH 236 

724 GGC CAT ATT ATT AAT TTC CTC GAT TTA GAT AAA CTA AAT CCG CTT TAT CAT ACA ATT CTT 783 

237 GDIINFVDLDKLNPLYHT1V 256 

784 ACT ACA TAC GAT GAA ACC ACT TCA GTA CAA GAA GTT GCA ACT CAT TCC ATT GAA GGC CGT 843 

257 TTY DETTSVQEVGTHSJ EGR 276 

844 ACA TCG ATT GAC GGT TTC TTG ATG TTT GAT GAT AAT GAG AAA TTA TGG AGA CTG TTT GAC 903 

277 TSIDGFLMFDDNEKLWRLFD 296 

904 GGG GAT AGT AAA GOG CGT TTA CGT ATC ATA ACG ACT AAT TCA AAC ATT CGT CCA AAT GAA 963 

297 CDS KARLRI ITTNSNIRPNE 316 

964 GAA GTT ATG CTA TTT GGT GCC AAT AAC GGA ACA ACT CAA ACA ATT GAG CTT AAG CTT CCA 1023 

317 EVMVFGANNGTTQTIELKLP 336 

1024 ACT AAT ATT TCT GTT GGT GAT ACT GTT AAA ATT TCC ATG AAT TAC ATG AGA AAA GGA CAA 10B3 

337 TNISVCDTVKISMNYHRKGQ 356 

1084 ACA GTT AAA ATC AAA GCT GCT GAT GAA GAT AAA ATT GCT TCT TCA CTT CAA TTG CTG CAA 1143 

357 TV K - I K A A D E D KI A S S V Q L I* Q 376 

1144 TTC CCA AAA CGC TCA GAA TAT CCA CCT GAA GCT GAA TGG CTT ACA GTT CAA OAA TTA GTT 1203 

377 F PKRSEY PPEAEWVTVQELV 396 

1204 TTT AAC GAT CAA ACT AAT TAT GTT CCA GTT TTG GAG CTT GCT TAC ATA CAA GAT TCT OAT 1263 

397 FNDETNYVPVLELAYIEDSD 416 

1264 GGA AAA TAT TCG GTT GTA CAG CAA AAC GTT CCA ACT GTA GAA AGA CTA GAT TCT TTA AAT 1323 

417 CKYWVVQONVPTVERVDSLN 436 

1324 GAT TCT ACT AGA GCA AGA TTA GGC GTA ATT GCT TTA GCT ACA CAA CCT CAA GCT AAT CTC 1383 

437 DSTRARLGVIALATCAQANV «56 

1384 GAT TTA OAA AAT TCT CCA CAA AAA GAA TTA GCA ATT ACT CCA GAA ACG TTA GCT AAT CCT 1443 

457 DLENSPQKE1.AITPETLAHR 476 

1444 ACT GCT ACA GAA ACT CGC AGA GGT ATT CCA AGA ATA OCA ACT ACT CCT CAA CTC AAT CAG 1503 

477 TATETRRGIARIATTAQVNQ 496 
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1504 AAC ACC ACA TTC TCT TTT OCT GAT CAT ATT ATC ATC ACT CCT AAA AAO CTG AAT GAA AGA 1563 

497 NTTFSFADDIIITPKKI-NER 516 

1564 ACT CCT ACA GAA ACT OCT AGA GGT GTC CCA GAA ATT GCT ACC CAG CAA GAA ACT AAT GCA 1623 

517TATETRRGVAEIATQQETMA 536 

1624 GGA ACC GAT GAT ACT ACA ATC ATC ACT CCT AAA AAG CTT CAA GCT OCT CAA GGT TCT GAA 1683 

537 GTDDTTI 1 TPKKLQARQGSE 556 

16B4 TCA TTA TCT GCT ATT CTA ACC TTT GTA TCT ACT GCA GGT GCT ACT CCA OCT TCT AGC OCT 1743 

557 SLSGIVTFVSTAGATPASSR 576 

1744 GAA TTA AAT CCT ACC AAT GTT TAT AAT AAA AAC ACT GAT AAT TTA CTT CTT TCA CCT AAA 1803 

577 ELNGTNVYNKNTDNLVVSPK 596 

1804 GCT TTG GAT CAO TAT AAA OCT ACT CCA ACA CAG CAA OCT GCA GTA ATT TTA OCA CTT GAA 1863 

597 ALDQYKATPTQQCAVII.AVE 616 

1864 ACT GAA GTA ATT GCT GGA CAA AGT CAG CAA GGA TOG GCA AAT GCT CTT GTA ACC CCA GAA 1923 

617 S EV I A G Q S QQGWANAVVTPE 636 

1924 ACC TTA CAT AAA AAG ACA TCA ACT GAT GGA AGA ATT GGT TTA ATT GAA ATT OCT ACQ CAA 1983 

637 T L H KKTSTDGRIG1.IEIATQ 656 

1984 ACT GAA GTT AAT ACA GGA ACT GAT TAT ACT CCT GCA GTC ACT CCT AAA ACT TTA AAT GAC 2043 

6S7SEVNTGTDYTRAVTPKTLND 676 

2044 CCT AGA GCA ACT GAA ACT TTA ACT GGT ATA GCT GAA ATT GCT ACA CAA GTT GAA TTC OAC 2103 

677 RR ATESLSGIAEIATQVEFD 696 

2104 GCA GGC GTC GAC GAT ACT CGT ATC TCT ACA CCA TTA AAA ATT AAA ACC AGA TTT AAT AGT 2163 

697 AGVDDTR ISTPLK 1KTRFNS 716 

2164 ACT GAT CGT ACT TCT GTT GTT GCT CTA TCT GGA TTA GTT GAA TCA GGA ACT CTC TOG GAC 2223 

717 TDRTSVVALSGLVESCTLWD 736 

2224 CAT TAT ACA CTT AAT ATT CTT GAA CCA AAT GAG ACA CAA CGT GGT ACA CTT CGT GTA GCT 2283 

737 HYTLNILEANETQRCTLRVA 756 

2284 ACC CAG GTC GAA GCT GCT GCG GGA ACA TTA GAT AAT GTT TTA ATA ACT CCT AAA AAG CTT 2343 

757 T 0 V EAAAGTLDNVL ITPKKL 776 

234 4 TTA GGT ACT AAA TCT ACT GAA GCG CAA GAG GGT GTT ATT AAA GTT GCA ACT CAG TCT GAA 2403 

777 LGTKSTEAQEGVI KVATQSE 796 

2404 ACT GTG ACT GGA ACC TCA GCA AAT ACT GCT GTA TCT CCA AAA AAT TTA AAA TOG ATT GCG 2463 

797 TVTGTSANTAVS PKNLKWIA 816 

24 64 CAG AGT GAA CCT ACT TOG GCA GCT ACT ACT GCA ATA AGA GCT TTT GTT AAA ACT TCA TCT 2523 

817 Q S E PTWAATTA I RG FVKTSS 836 

2S24 GGT TCA ATT ACA TTC GTT GGT AAT GAT ACA GTC GGT TCT ACC CAA GAT TTA GAA CTG TAT 2583 

837 G S ITFVGNDTVGSTODLELY 856 

2S84 GAG AAA AAT AGC TAT GCG GTA TCA CCA TAT GAA TTA AAC CCT GTA TTA GCA AAT TAT TTG 2643 

857 EKNSYAVSPYELNRVLANYL 876 

2644 CCA CTA AAA GCA AAA GCT GCT GAT ACA AAT TTA TTG CAT GGT CTA GAT TCA TCT CAG TTC 2703 

877 p L K A KAADTKLLDGLDSSQF 896 

2704 ATT CGT AGG GAT ATT GCA CAG ACG GTT AAT GGT TCA CTA ACC TTA ACC CAA CAA ACQ AAT 2763 

897 I RRDIAQTVNGSL TLTQQTN 916 

2764 CTG AGT GCC CCT CTT GTA TCA TCT AGT ACT~ GOT GAA TTT GGT GCT TCA TTG GCC CCT AAT 2823 

917 LSA PLVS SSTGEFGGSLAAN 936 

2824 AGA ACA TTT ACC ATC CCT AAT ACA GGA GCC CCG ACT AGT ATC GTT TTC OAA AAA GOT CCT 2883 

937 RTFTIRNTGAPTS IVFEKGP 956 

2884 GCA TCC GCG GCA AAT CCT GCA CAG TCA ATG AGT ATT CGT GTA TOG GGT AAC CAA TTT GGC 2943 

957 A S G A N P A Q S M S I R V W C . 0 F, G 976 

2944 GGC GGT AGT GAT ACG ACC CGT TCC ACA CTG TTT GAA GTT GGC GAT GAC ACA TCT CAT CAC 3003 

977 GGSDTTRSTVFEVGDDTSHH 996 

3004 TTT TAT TCT CAA CGT AAT AAA GAC GGT AAT ATA GCG TTT AAC ATT AAT OCT ACT CTA ATC 3063 

997 F Y SCrRNKDGNlAFN INGTVM 1016 

3064 CCA ATA AAC ATT AAT GCT TCC GCT TTG ATG AAT GTC AAT GCC ACT GCA ACA TTC OCT CCT 3123 

1017 P ININASGLMNVNCTAT FGR 1036 

3124 TCA GTT ACA GCC AAT GCT GAA TTC ATC AGC AAG TCT GCA AAT GCT TTT AGA OCA ATA AAC 3183 

1037 SVTANGEF I SKSANAFRAIN 1056 
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3164 OCT GAT TAC OCA TTC TTT ATT COT AAT GAT CCC TCT AAT ACC TAT TTT TTG CTC ACT OCA 3243 

1057 G DYGFFIRNDASNTYFLLTA 1076 

3244 GCC OCT CAT CAC ACT GOT OCT TTT AAT CCA TTA CCC CCA TTA TTA ATT AAT AAT CAA TCC 3303 

1077 A GDQTGCFNGLR PLLINHQS 1096 

3304 CGT CAG ATT ACA ATT GOT GAA GGC TTA ATC ATT GCC AAA GOT CTT ACT ATA AAT TCA CCC 3363 

1097 GQITICEGLIIAKCVTINSG 1116 

3364 GOT TTA ACT CTT AAC TOG ACA ATT CGT TCT CAG OCT ACT AAA ACA TCT GAT TTA TAT ACC 3423 

1117 CLTVNSRIRSQCTKTSDLYT 1136 

3424 CGT CCC CCA ACA TCT CAT ACT CTA GGA TTC TOG TCA ATC CAT ATT AAT GAT TCA GCC ACT 3483 

1137 R A PTSDTVC FWS 1DINDSAT 1156 

3484 TAT AAC CAG TTC COG OCT TAT TTT AAA ATC GTT GAA AAA ACT AAT CAA GTG ACT GGG CTT 3543 

1157 Y NQFPGYFKMVEKTNEVTCL 1176 

3544 CCA TAC TTA GAA CGT GGC GAA GAA GTT AAA TCT CCT OCT ACA CTC ACT CAG TTT CGT AAC 3603 

1177 F YL ERGEEVKSPGTLTQFCK 1196 

3604 ACA CTT GAT TCC CTT TAC CAA GAT TGG ATT ACT TAT CCA ACC ACC CCA GAA COG CGT ACC 3663 

ll97 TLDSLYO.DWITYPTTPEART 1216 

3664 ACT CGC TGG ACA CGT ACA TGG CAG AAA ACC AAA AAC TCT TGG TCA AGT TTT CTT CAG GTA 3723 

— 1217 TRWTRTWOKTKNSWSSFV0V 1236 

3724 TTT CAC GGA CGT AAC CCT CCT CAA CCA TCT GAT ATC GGT GCT TTA CCA TCT GAT AAT GCT 3783 

1237 F DGCNPPQPSDIGALPSDWA 1256 

3784 ACA ATC GGG AAT CTT ACT ATT CGT GAT TTC TTC CCA ATT GGT AAT CTT CGC ATT GTT CCT 3843 

1257 T MGNLTI RDFLR 'IGNVR IVP 1276 

3844 GAC CCA GTG AAT AAA ACC GTT AAA TTT OAA TGG GTT GAA TAA GAGGTATT ATC GAA AAA TTT 3905 
1277 D PVNKTVKFEWVE* MEKF4 

3906 ATG GCC GAG ATT TGG ACA AGO ATA TCT CCA AAC GCC ATT TTA TCC GAA ACT AAT TCA GTA 3965 

5MAEIWTRICPNAILSESNSV 24 

3966 AGA TAT AAA ATA AGT ATA GCC GGT TCT TCC CCC CTT TCT ACA GCA GGA CCA TCA TAT GTT 4025 

25 R YKI S IAGSCPLSTAGPSYV 44 

4026 AAA TTT CAG GAT AAT CCT GTA GGA AGT CAA ACA TTT AGO CGC AGG CCT TCA TTT AAG AGT 4085 

45 K FCDNPVGSQTFRRRPSFKS 64 

4086 TTT TGA CCCTTCCACOGGAGCATTAGTTGATAGTAAGTCAT ATG CTT TTT CCA CTT CAA ATG ATA CTA 4153 
65 F * MLFRLQMIL9 

4154 CAT CAG 'CTC CTT TTG TTA GTT TTC ATG AAT TCT TTC ACC AAT AAT CCA ATT CTT GCT ATA 4213 

10HQLLLLVFMNSLTNNRIVAI 29 

4214 TTA ACT AGT GGA AAG GTT AAT TTT CCT CCT GAA CTA CTA TCT TGG TTA AGA ACC GCC GGA 4273 

30LTSGKVNFPPEVVSWLR TAC 49 

4274 AGG TCT CCC TTT CCA TCT GAT TCT ATA TTG TCA AGA TTT GAC CTA TCA TAT GCT GCT TTT 4333 

50 T SAF PSD S I LSRFDVSYAAF 69 

4334 TAT ACT TCT TCT AAA AGA GCT ATC GCA TTA GAG CAT GTT AAA CTC AGT AAT AGA AAA AGC 4393 

70YTSSKRAIALEHV KLSNRKS 89 

4394 ACA GAT CAT TAT CAA ACT ATT TTA GAT CTT GTA TTT GAC AGT TTA CAA CAT GTA GGA GCT 4453 

90 T DDY QT 1 LD VVFDSLEDVCA 109 

4454 ACC GGG TTT CCA AGA AGA ACC TAT GAA ACT GTT GAG CAA TTC ATG TOG GCA CTT CGT GGA 4513 

110 TGFP RRTYESVEQFKSAVGG 129 

4514 ACT AAT AAC GAA ATT GCC AGA TTG CCA ACT TCA OCT GCT ATA AGT AAA TTA TCT GAT TAT 4573 

130TNNEIARL PTSAAISKLSDY 149 

4574 AAT TTA ATT CCT GCA GAT GTT CTT TAT CTT AAA OCT CAG TTA TAT GCT GAT GCT GAT TTA 4633 

150 N L 1 P G D V L Y I* K A Q L Y A DAD L 169 

4634 CTT GCT CTT GGA ACT ACA AAT ATA TCT ATC CGT TTT TAT AAT GCA TCT AAC GGA TAT ATT 4693 

170 LALGTTN ISIRFYNASNGY I 189 

4694 TCT TCA ACA CAA GCT GAA TTT ACT GGG CAA GCT GGG TCA TGG CAA TTA AAG OAA GAT TAT 4753 

190 S STQAEFTGQAG5WEI.KEDY 209 

4754 OTA GTT GTT CCA GAA AAC OCA GTA OCA TTT ACQ ATA TAC GCA CAO AGA ACT CCA CAA CCT 4813 

210 VVVPENAVGFTIYAQRTAQA 229 
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4614 GOC CAA OCT CCC ATG AGA AAT TTA ACC TIT TCT GAA CTA TCA ACA AAT CCC CGC ATT TCG 4673 
230 GQGCHRNLSFSE VSRN 

4874 AAA CCT OCT GAA TTT GGC GTC AAT GGT ATT CGT GTT AAT TAT ATC TGC GAA TCC GOT TCA 4933 

250 KPAEFGVNGIRVNYICESAS 2f>S 

4934 CCT CCG GAT ATA ATG GTA CTT CCT ACG CAA GCA TCG TCT AAA ACT GGT AAA CTG TTT OGC 4993 

270 PPDIMVLPTCASSKTGKVFC 289 

4994 CAA GAA TTT AGA GAA CTT TAA ATTGAGGGACCCTTCGGGTTCCC"1 , 1T'1 , 1V.T 1 1 1 ATAAATACTATTCAAATAAA 5066 

290 Q E F R E V * *** 

S067 GGGGCATACA ATC GCT GAT TTA AAA GTA GGT TCA ACA ACT GGA GGC TCT GTC ATT TGC CAT 5127 

1 MADLKVGSTTCGSVIWH 1/ 

5128 CAA GGA AAT TTT CCA TTG AAT CCA GCC GGT GAC GAT GTA CTC TAT AAA TCA TTT AAA ATA 5187 

18QGNF P I* N PAGDDVLYKSFXI 37 

5188 TAT TCA GAA TAT AAC AAA CCA CAA GCT GCT GAT AAC GAT TTC GTT TCT AAA GCT AAT GGT 5247 

38 Y S EY N K P QAADNDFV SKANG i>/ 

5248 GGT ACT TAT GCA TCA AAG GTA ACA TTT AAC GCT GGC ATT CAA GTC CCA TAT GCT CCA AAC 5307 

58GTYA SKVTFNAG J QVPYAPW 7/ 

5308 ATC ATC AGC CCA TGC GGG ATT TAT GGG GGT AAC GGT GAT GGT GCT ACT TTT GAT ll^ 

78IMSPCGIYCCNCDGATFDKA 97 

5368 AAT ATC GAT ATT GTT TCA TGG TAT GGC GTA GGA TTT AAA TCG TCA TTT GGT TCA ACA GGC 5427 

98 N I D I-VSW YGVGF KSS FGSTG 117 

5428 CGA ACT GTT GTA ATT AAT ACA CGC AAT GGT GAT ATT AAC ACA AAA GGT GTT CTG TCG GCA 5487 

118RTVVINTRNGDINTKCVVSA 137 

5488 GCT GGT CAA GTA AGA ACT GGT GCC GCT GCT CCT ATA GCA GOG AAT GAC CTT ACT AGA AAG 5547 

138AGQVRSGAAAPIAANDLTRK 157 

5548 GAC TAT GTT GAT GGA GCA ATA AAT ACT GTT ACT GCA AAT GCA AAC TCT AGG CTG CTA CGG 5607 

158DYVDGAINTVTANANSRVLR 177 

5608 TCT GGT GAC ACC ATG ACA GGT AAT TTA ACA GOG CCA AAC TTT TTC TCG CAG AAT CCT OCA 5667 

178SGDTMTGNLTAPNFFSQNPA 197 

5668 TCT CAA CCC TCA CAC GTT CCA CGA TTT GAC CAA ATC GTA ATT AAG GAT TCT GTT CAA GAT 5727 

198SQPSHVPRFDQIVIXDSVQD 217 

5728 TTC GGC TAT TAT TAA GAGGACTT ATG GCT ACT TTA AAA CAA ATA CAA TTT AAA AGA AGC AAA 5789 
218FCYY* MATLKQI QFKRSK1J 

5790 ATC GCA GGA ACA CGT CCT GCT GCT TCA GTA TTA GCC GAA GGT GAA TTG CCT ATA AAC TTA 5649 

14IAGTR PAASVLAEGELAINL 3-> 

5850 AAA GAT AGA ACA ATT TTT ACT AAA GAT GAT TCA GGA AAT ATC ATC GAT CTA GGT TTT GCT 5909 

34KDR T I FT KDDSGN I I DLGFA bJ 

5910 AAA GGC GGG CAA GTT GAT GGC AAC GTT ACT ATT AAC GGA CTT TTG AGA TTA AAT GGC GAT 5969 

54KGGQVDGNVTINGLLRLNGD 73 

5970 TAT GTA CAA ACA GGT GGA ATG ACT GTA AAC GGA CCC ATT GGT TCT ACT GAT GGC GTC ACT 6029 

74YVQTGCMTVNGP1GSTDGVT 93 

6030 GGA AAA ATT TTC AGA TCT ACA CAG GGT TCA TTT TAT GCA AGA GCA ACA AAC GAT ACT TCA 6089 

94GKIFRSTQCSFYA RATNDTS 113 

6090 AAT GCC CAT TTA TOG TTT GAA AAT GCC GAT GGC ACT GAA CGT GGC GTT ATA TAT OCT CGC 6149 
114NAHLWFENADCTERCVIYAR 

6150 CCT CAA ACT ACA ACT GAC GGT GAA ATA OGC CTT AGG GTT AGA CAA GGA ACA GGA AGC ACT 6209 

134POTTTDGEIRLRVRQCTGST 153 

6210 GCC AAC ACT GAA TTC TAT TTC OGC TCT ATA AAT GGA GGC GAA TTT CAG CCT MC OCT ATT 6269 

154ANSEFYFRSINGGEFQANRI l'-> 

6270 TTA GCA TCA GAT TOG TTA GTA ACA AAA CGC ATT GCG GTT GAT ACC GTT ATT CAT GAT GCC 6329 

174LASD SLVTKRIAVDTVIHDA 193 

6330 AAA GCA TTT GGA CAA TAT GAT TCT CAC TCT TTC GTT AAT TAT GTT TAT CCT GGA ACC GGT 6389 

194KAFGQYDSHSLVKYVYPCTG 213 

6390 GAA ACA AAT GGT GTA AAC TAT CTT CGT AAA GTT CGC GCT AAG TCC GGT GGT ACA ATT TAT 6449 

214ETNGVKYLRKVRAKSGGTIY 233 

6450 CAT GAA ATT GTT ACT GCA CAA ACA GGC CTG GCT GAT GAA GTT TCT TGC TOG HV 

234 HEIVTAQTGLADEVSWWSGD 253 
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6510 ACA CCA CTA TIT AAA CTA TAC OCT ATT OCT CAC GAT GGC ACA ATG ATT ATC OCT AAT AGC 6569 

254 TPVFKl»VGIR.DDORKlIRNS 273 

6570 CTT OCA TTA OCT ACA TTC ACT ACA AAT TTC COG TCT ACT GAT TAT OOC AAC OTC GOT OTA 6629 

274 LALGTFTTNFPSSDYOHVOV 293 

6630 ATC GGC GAT AAG TAT CTT CTT CTC GGC GAC ACT CTA ACT GGC TTC TCA TAC AAA AAA ACT 6689 

294 MCDKYLVLGDTVTGLSYKKT 313 

6690 GOT CTA TTT GAT CTA GTT GGC GGT GGA TAT TCT CTT GOT TCT ATT ACT CCT GAC ACT TTC 6749 

314GVFDLVGGGYSVASITPDSF 333 

6750 CCT AGT ACT OCT AAA GGT ATA TTT GGT OCT TCT GAG GAC CAA GGC OCA ACT TOG ATA ATG 6809 

334 RSTRKGIFGRSEDQGATWIM 353 

6810 CCT GOT ACA AAT OCT OCT CTC TTC TCT CTT CAA ACA CAA OCT OAT AAT AAC AAT CCT GGA 6869 

3S4PCTNAALLSVQTOADNNNAG 373 

6870 CAC OGA CAA ACC CAT ATC COG TAC AAT CCT GGC GGT AAA ATG AAC CAC TAT TTC COT OCT 6929 

374 DGOTHICYNAGGKHNHYFRO 393 

6930 ACA OCT CAC ATG AAT ATC AAT ACC CAA CAA GGT ATG GAA ATT AAC CCO OCT ATT TTC AAA 6989 

394 TGQMNINTQQGMEXNPGII** 413 

6990 TTG CTA ACT GGC TCT AAT AAT CTA CAA TTT TAC OCT GAC OGA ACT ATT TCT TCC ATT CAA 7049 

414LVTGSNNVQFYADGTISSIQ 433 

7050 OCT ATT AAA TTA OAT AAC GAG ATA TTT TTA ACT AAA TCT AAT AAT ACT COS OCT CTT AAA 7109 

434 PIKLDNEIFLTKSNNTAGLK 453 

7110 TTT OGA CCT CCT AGC CAA GTT GAT GGC ACA AGG ACT ATC CAA TGG AAC COT OCT ACT CGC 7169 

454 FGAPSOVDGTRTIQWNGGTR 473 

7170 CAA GGA CAC AAT AAA AAC TAT GTG ATT ATT AAA CCA TGG GGT AAC TCA TTT AAT GCC ACT 7229 

474 EGQNKNYVIIKAWGNSFNAT 493 

7230 GOT GAT AGA TCT CGC GAA ACC GTT TTC CAA CTA TCA GAT ACT CAA GGA TAT TAT TTT TAT 7289 

494 G DRS RETVFOVSDSCGYYPY 513 

7290 GCT CAT OCT AAA GCT CCA ACC GGC GAC GAA ACT ATT GGA CCT ATT GAA GOT CAA TTT CCT 7349 

514 A HRKA PTGDETIGRIEAQFA 533 

7350 OOG GAT GTT TAT GCT AAA GGT ATT ATT GCC AAC GGA AAT TTT AGA GTT GTT GGG TCA AGC 7409 

534 G DVYA KGIIANCNFRVVCSS $53 

74 20 GCT TTA GCC GGC AAT GTT ACT ATC TCT AAC GGT TTG TTT CTC CAA GGT GGT TCT TCT ATT 7469 

554 ALAGNVTMSNGLFVOGGSSI 573 

7470 ACT OGA CAA GTT AAA ATT GGC GGA ACA CCA AAC GCA CTC AGA ATT TGG AAC GCT GAA TAT 7529 

574 TGQVKIGGTANALR1WKAEY 593 

7530 GGT GCT ATT TTC COT OCT TOG GAA AGT AAC TTT TAT ATT ATT CCA ACC AAT CAA AAT GAA 7589 

594 GAIFRRSESNFYIIPTNQNE 613 

7590 GCA GAA AGT GCA GAC ATT CAC ACC TCT TTG AGA CCT GTG AGA ATA GCA TTA AAC GAT CGC 7649 

614GESGDIHSSLRPVRIGLNDC 633 

7650 ATC GTT GGG TTA OGA AGA GAT TCT TTT ATA GTA CAT CAA AAT AAT GCT TTA ACT ACC ATA 7709 

634 MVGLGRDSFIVDQNNALTTI 653 

7710 AAC ACT AAC TCT CGC ATT AAT GCC AAC TTT AGA ATG CAA TTG GOG CAG TOG OCA TAC ATT 7769 

654 N SNS R INANFRMQLCQSAY I 673 

7770 GAT GCA GAA TCT ACT GAT GCT GTT CGC CCG GOG GGT GCA GGT TCA TTT OCT TCC CAG AAT 7829 

674 DAEC. TDAVRP^AGAGSFAS QN 693 

7830 AAT GAA GAC CTC COT GCC CCG TTC TAT ATG AAT ATT GAT AGA ACT GAT OCT AGT GCA TAT 7889 

694 H EDVRAPF YMNIDRTDASAY 713 

7890 GTT CCT ATT TTG AAA CAA CCT TAT CTT CAA GGC AAT GGC TCC TAT TCA TTA COO ACT~TTA 7949 

714 V P 1 LKCRYVQGNCCYSi.'CTL 733 

7950 ATT AAT AAT GOT AAT TTC OCA CTT CAT TAC CAT GGC GCC GGA CAT AAC OCT TCT ACA COT 8009 

734 INNGNFRVHYHGCGDNGSTG 753 

8010 CCA CAG ACT GCT CAT TTT GGA TOG GAA TTT ATT AAA AAC GCT OAT TTT ATT TCA CCT CGC 8069 

754 PQTADFGWEFIKN GDFISPR 773 

8070 OAT TTA ATA OCA GGC AAA CTC AGA TTT GAT AGA ACT GGT AAT ATC ACT GOT GOT TCT COT 8129 

774 DLIAGKVRFDRTGKITGGSG 793 
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